[Performance]: Throughput significantly drop after enabled UCM.

### Proposal to improve performance

_No response_

### Report of performance regression

I compared the performance between UCM and vLLM baseline with the benchmark provided in the repo, and found the throughput dropped significantly. 
Baseline: 18246.54 token/s
UCM: 1745.75 token/s
Hardware: H100 GPU X2

Is my configuration correct?

Commit: 28f6f358a1a7f96ce4a2ecea24d09e35d7539b2f

Prefill kv transfer config:
{
    "kv_connector": "UnifiedCacheConnectorV1",
    "kv_connector_module_path": "ucm.integration.vllm.uc_connector",
    "kv_role": "kv_producer",
    "kv_connector_extra_config": {
        "ucm_connector_name": "UcmDramStore",
        "ucm_connector_config": {
            "max_cache_size": 5368709120,
            "kv_block_size": 262144
        }
    }
}

Decode kv transfer config:
{
    "kv_connector": "UnifiedCacheConnectorV1",
    "kv_connector_module_path": "ucm.integration.vllm.uc_connector",
    "kv_role": "kv_consumer",
    "kv_connector_extra_config": {
        "ucm_connector_name": "UcmDramStore",
        "ucm_connector_config": {
            "max_cache_size": 5368709120,
            "kv_block_size": 262144
        }
    },
    "ucm_sparse_config": {
        "GSA": {}
    }
}

vLLM server args (prefill and decode use the same server args):
--model /models/Qwen3-30B-A3B-Instruct-2507 --max-model-len 80000 --trust-remote-code --gpu_memory_utilization 0.9 --enforce-eager --no-enable-prefix-caching --block-size 128 --dtype bfloat16 --tensor-parallel-size 1

proxy server command:
python3 toy_proxy_server.py --host localhost --port 43215 --prefiller-host localhost --prefiller-port 43210 --pd-disaggregation --decoder-host localhost --decoder-port 43211

Benchmark launching command:
python3 trace_replay.py \
  --model "/models/Qwen3-30B-A3B-Instruct-2507" \
  --backend vllm \
  --trace-path FAST25-release/traces/conversation_trace.jsonl \
  --trace-mode trace \
  --host 127.0.0.1 \
  --port 43215 \
  --save-result \
  --save-prompts



### Misc discussion on performance

_No response_

### Your current environment (if you think it is necessary)

```text
The output of `python collect_env.py`
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Performance]: Throughput significantly drop after enabled UCM. #369

Proposal to improve performance

Report of performance regression

Misc discussion on performance

Your current environment (if you think it is necessary)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Performance]: Throughput significantly drop after enabled UCM. #369

Description

Proposal to improve performance

Report of performance regression

Misc discussion on performance

Your current environment (if you think it is necessary)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions