Skip to content

[Feature]: Change dump kv blocks to request level and suppot multi connector #331

@harrisonyhq

Description

@harrisonyhq

🚀 The feature, motivation and pitch

Motivation

In previous v0.9.2 version, our uc connector dispatched dump tasks and commits at the block level. The advantage of this approach is that it allows for finer-grained control over which KV blocks are successfully offloaded. However, it introduced several issues:

  • Coupling to vLLM internal fields: Tracking block-level dump success required patching vLLM source code to access and record per-block status. Without such patches, the UC Connector would alter the semantics of internal request fields (e.g., succeed_dumped_blocks), which is risky and potentially incompatible with upstream.
  • Performance overhead: Block-level granularity led to high task count and degraded connector throughput due to overhead in managing individual block tasks.
  • Incorrect commits in multi-connector setups: In setups using MultiKVConnector, repeated callbacks for already-finished request IDs across different connectors caused UC to incorrectly commit dumped blocks as failed (false).

To address these concerns, starting from vLLM version 0.11.0, I proposed and implemented switching to request-level dispatch of dump and commit tasks, while ensuring compatibility with multi-connector topologies.


What might be changed

1. Request-Level Dump & Commit Dispatch

  • Instead of looping over each block and issuing dump calls per block, the connector now batches all blocks associated with a request and dispatches one task per request to the underlying connector.
  • The dump_tasks structure changes from {req_id: {block_id: [task]}} to {req_id: [task]}.

2. Request-Level Commit Handling

  • The UC connector now commits all blocks in a request at once, based on whether the entire request's tasks completed successfully.
  • self.success_reqs and self.failed_reqs sets track the request-level status.
  • Block-wise commit calls are aggregated from the BlockInfo structure if a request finishes.

3. MultiConnector Compatibility

  • Introduced self.current_req and self.last_req to track the most recent and previous request during scheduling.
  • These fields help the UC connector avoid false negatives when other connectors redundantly report already-finished request IDs, which would otherwise cause UC to prematurely commit(..., False) on valid blocks.

Files To Be Modified

  • ucm/integration/vllm/uc_connector.py — Core logic changed to enable request-level granularity for dump, wait, and commit.

Internal Logic Highlights

Area Before After
dump_tasks structure Nested per block Flat per request: Dict[str, List[Task]]
Dump dispatch granularity Per block Per request
Commit point Based on succeed_dumped_blocks field Based on success_reqs & finished_reqs tracking
wait_for_save() logic Waits per block, tracks block success Waits per request, tracks full success/failure
Multi connector support None (false commits possible) Uses last_req + current_req to disambiguate

Compatibility Notes

  • This change does not alter the KVConnectorBase_V1 interface.
  • Fully backward-compatible with single-connector use cases.
  • Adds necessary safeguards to avoid regression in MultiConnector environments.

Future Improvements

  • Remove current_req and last_req to completely compatible to MultiConnector.
  • Improve performance when using MultiConnector in 1p1d scenario.

Alternatives

No response

Additional context

No response

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions