Skip to content

[RFC]: Speed up e2e tests #4146

@Potabk

Description

@Potabk

Motivation.

For existing end-to-end (E2E) and nightly tests are extremely long e2e(almost 139min) and nightly-test: 4h51min, we must find ways to reduce test duration to improve the developer experience.

Proposed Change.

  1. Speed up load weight time:
    As [Core] feat: Add --safetensors-load-strategy flag for faster safetensors loading from Lustre vllm#24469 show, the methods for loading models using --safetensors-load-strategy eager, It can greatly improve loading speed (if your storage is network storage, such as NFS).
    and I also have a local experiment: Loading the Qwen3-32B model from sfs_turbo, the results are as follows.
# do not use eager mode(default lazy)
Loading safetensors checkpoint shards:   0% Completed | 0/17 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:   6% Completed | 1/17 [00:03<00:50,  3.13s/it]
Loading safetensors checkpoint shards:  12% Completed | 2/17 [00:08<01:09,  4.63s/it]
Loading safetensors checkpoint shards:  18% Completed | 3/17 [00:14<01:11,  5.09s/it]
Loading safetensors checkpoint shards:  24% Completed | 4/17 [00:20<01:09,  5.38s/it]
Loading safetensors checkpoint shards:  29% Completed | 5/17 [00:24<00:58,  4.87s/it]
Loading safetensors checkpoint shards:  35% Completed | 6/17 [00:30<00:57,  5.21s/it]
Loading safetensors checkpoint shards:  41% Completed | 7/17 [00:36<00:55,  5.58s/it]
Loading safetensors checkpoint shards:  47% Completed | 8/17 [00:42<00:51,  5.67s/it]
Loading safetensors checkpoint shards:  53% Completed | 9/17 [00:48<00:46,  5.77s/it]
Loading safetensors checkpoint shards:  59% Completed | 10/17 [00:54<00:40,  5.82s/it]
Loading safetensors checkpoint shards:  65% Completed | 11/17 [01:00<00:34,  5.82s/it]
Loading safetensors checkpoint shards:  71% Completed | 12/17 [01:05<00:29,  5.82s/it]
Loading safetensors checkpoint shards:  76% Completed | 13/17 [01:10<00:21,  5.49s/it]
Loading safetensors checkpoint shards:  82% Completed | 14/17 [01:16<00:16,  5.60s/it]
Loading safetensors checkpoint shards:  88% Completed | 15/17 [01:21<00:11,  5.54s/it]
Loading safetensors checkpoint shards:  94% Completed | 16/17 [01:27<00:05,  5.60s/it]
Loading safetensors checkpoint shards: 100% Completed | 17/17 [01:33<00:00,  5.72s/it]
Loading safetensors checkpoint shards: 100% Completed | 17/17 [01:33<00:00,  5.51s/it]
(Worker_TP0 pid=1199)
(Worker_TP0 pid=1199) INFO 11-12 06:57:28 [default_loader.py:267] Loading weights took 93.78 seconds
(Worker_TP3 pid=1202) INFO 11-12 06:57:28 [default_loader.py:267] Loading weights took 95.21 seconds
(Worker_TP2 pid=1201) INFO 11-12 06:57:28 [default_loader.py:267] Loading weights took 97.60 seconds
(Worker_TP1 pid=1200) INFO 11-12 06:57:28 [default_loader.py:267] Loading weights took 96.35 seconds

# using eager mode

loading safetensors checkpoint shards (eager):   0% Completed | 0/17 [00:00<?, ?it/s]
(Worker_TP1 pid=2940) Downloading Model from https://www.modelscope.cn to directory: /root/.cache/modelscope/hub/models/Qwen/Qwen3-32B
Loading safetensors checkpoint shards (eager):   6% Completed | 1/17 [00:01<00:17,  1.11s/it]
Loading safetensors checkpoint shards (eager):  12% Completed | 2/17 [00:02<00:17,  1.17s/it]
Loading safetensors checkpoint shards (eager):  18% Completed | 3/17 [00:03<00:16,  1.17s/it]
Loading safetensors checkpoint shards (eager):  24% Completed | 4/17 [00:04<00:15,  1.16s/it]
Loading safetensors checkpoint shards (eager):  29% Completed | 5/17 [00:05<00:12,  1.07s/it]
Loading safetensors checkpoint shards (eager):  35% Completed | 6/17 [00:06<00:12,  1.11s/it]
Loading safetensors checkpoint shards (eager):  41% Completed | 7/17 [00:07<00:11,  1.15s/it]
Loading safetensors checkpoint shards (eager):  47% Completed | 8/17 [00:09<00:10,  1.16s/it]
Loading safetensors checkpoint shards (eager):  53% Completed | 9/17 [00:10<00:09,  1.18s/it]
Loading safetensors checkpoint shards (eager):  59% Completed | 10/17 [00:11<00:08,  1.21s/it]
Loading safetensors checkpoint shards (eager):  65% Completed | 11/17 [00:12<00:07,  1.21s/it]
Loading safetensors checkpoint shards (eager):  71% Completed | 12/17 [00:14<00:06,  1.21s/it]
Loading safetensors checkpoint shards (eager):  76% Completed | 13/17 [00:15<00:04,  1.20s/it]
Loading safetensors checkpoint shards (eager):  82% Completed | 14/17 [00:16<00:03,  1.21s/it]
(Worker_TP2 pid=2941) INFO 11-12 06:59:25 [default_loader.py:267] Loading weights took 19.97 seconds
Loading safetensors checkpoint shards (eager):  88% Completed | 15/17 [00:17<00:02,  1.21s/it]
(Worker_TP3 pid=2942) INFO 11-12 06:59:26 [default_loader.py:267] Loading weights took 19.76 seconds
Loading safetensors checkpoint shards (eager):  94% Completed | 16/17 [00:18<00:01,  1.16s/it]
Loading safetensors checkpoint shards (eager): 100% Completed | 17/17 [00:19<00:00,  1.12s/it]
Loading safetensors checkpoint shards (eager): 100% Completed | 17/17 [00:19<00:00,  1.16s/it]
(Worker_TP0 pid=2939)
(Worker_TP0 pid=2939) INFO 11-12 06:59:27 [default_loader.py:267] Loading weights took 19.78 seconds
(Worker_TP1 pid=2940) INFO 11-12 06:59:28 [default_loader.py:267] Loading weights took 19.29 seconds

as the result can see, loading speed is four times faster.

  1. nightly test using quay.io/ascend/vllm-ascend:main
    Since nightly tests are less sensitive to real-time performance, we can consider using the latest image from the main branch directly. see [CI] Add daily images build for nightly ci #3989 and [CI] Integrate mooncake to vllm-ascend base image #4062

Feedback Period.

No response

CC List.

@Yikun @zhangxinyuehfad @leo-pony @wangxiyuan

Any Other Things.

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    RFCRequest For Comments

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions