Skip to content

[Performance]: Throughput significantly drop after enabled UCM. #369

@shulingWarm

Description

@shulingWarm

Proposal to improve performance

No response

Report of performance regression

I compared the performance between UCM and vLLM baseline with the benchmark provided in the repo, and found the throughput dropped significantly.
Baseline: 18246.54 token/s
UCM: 1745.75 token/s
Hardware: H100 GPU X2

Is my configuration correct?

Commit: 28f6f35

Prefill kv transfer config:
{
"kv_connector": "UnifiedCacheConnectorV1",
"kv_connector_module_path": "ucm.integration.vllm.uc_connector",
"kv_role": "kv_producer",
"kv_connector_extra_config": {
"ucm_connector_name": "UcmDramStore",
"ucm_connector_config": {
"max_cache_size": 5368709120,
"kv_block_size": 262144
}
}
}

Decode kv transfer config:
{
"kv_connector": "UnifiedCacheConnectorV1",
"kv_connector_module_path": "ucm.integration.vllm.uc_connector",
"kv_role": "kv_consumer",
"kv_connector_extra_config": {
"ucm_connector_name": "UcmDramStore",
"ucm_connector_config": {
"max_cache_size": 5368709120,
"kv_block_size": 262144
}
},
"ucm_sparse_config": {
"GSA": {}
}
}

vLLM server args (prefill and decode use the same server args):
--model /models/Qwen3-30B-A3B-Instruct-2507 --max-model-len 80000 --trust-remote-code --gpu_memory_utilization 0.9 --enforce-eager --no-enable-prefix-caching --block-size 128 --dtype bfloat16 --tensor-parallel-size 1

proxy server command:
python3 toy_proxy_server.py --host localhost --port 43215 --prefiller-host localhost --prefiller-port 43210 --pd-disaggregation --decoder-host localhost --decoder-port 43211

Benchmark launching command:
python3 trace_replay.py
--model "/models/Qwen3-30B-A3B-Instruct-2507"
--backend vllm
--trace-path FAST25-release/traces/conversation_trace.jsonl
--trace-mode trace
--host 127.0.0.1
--port 43215
--save-result
--save-prompts

Misc discussion on performance

No response

Your current environment (if you think it is necessary)

The output of `python collect_env.py`

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions