-
Notifications
You must be signed in to change notification settings - Fork 50
Open
Labels
Description
🚀 The feature, motivation and pitch
Motivation
In previous v0.9.2 version, our uc connector dispatched dump tasks and commits at the block level. The advantage of this approach is that it allows for finer-grained control over which KV blocks are successfully offloaded. However, it introduced several issues:
- Coupling to vLLM internal fields: Tracking block-level dump success required patching vLLM source code to access and record per-block status. Without such patches, the UC Connector would alter the semantics of internal request fields (e.g.,
succeed_dumped_blocks), which is risky and potentially incompatible with upstream. - Performance overhead: Block-level granularity led to high task count and degraded connector throughput due to overhead in managing individual block tasks.
- Incorrect commits in multi-connector setups: In setups using
MultiKVConnector, repeated callbacks for already-finished request IDs across different connectors caused UC to incorrectly commit dumped blocks as failed (false).
To address these concerns, starting from vLLM version 0.11.0, I proposed and implemented switching to request-level dispatch of dump and commit tasks, while ensuring compatibility with multi-connector topologies.
What might be changed
1. Request-Level Dump & Commit Dispatch
- Instead of looping over each block and issuing dump calls per block, the connector now batches all blocks associated with a request and dispatches one task per request to the underlying connector.
- The
dump_tasksstructure changes from{req_id: {block_id: [task]}}to{req_id: [task]}.
2. Request-Level Commit Handling
- The UC connector now commits all blocks in a request at once, based on whether the entire request's tasks completed successfully.
self.success_reqsandself.failed_reqssets track the request-level status.- Block-wise commit calls are aggregated from the
BlockInfostructure if a request finishes.
3. MultiConnector Compatibility
- Introduced
self.current_reqandself.last_reqto track the most recent and previous request during scheduling. - These fields help the UC connector avoid false negatives when other connectors redundantly report already-finished request IDs, which would otherwise cause UC to prematurely
commit(..., False)on valid blocks.
Files To Be Modified
ucm/integration/vllm/uc_connector.py— Core logic changed to enable request-level granularity fordump,wait, andcommit.
Internal Logic Highlights
| Area | Before | After |
|---|---|---|
dump_tasks structure |
Nested per block | Flat per request: Dict[str, List[Task]] |
| Dump dispatch granularity | Per block | Per request |
| Commit point | Based on succeed_dumped_blocks field |
Based on success_reqs & finished_reqs tracking |
wait_for_save() logic |
Waits per block, tracks block success | Waits per request, tracks full success/failure |
| Multi connector support | None (false commits possible) | Uses last_req + current_req to disambiguate |
Compatibility Notes
- This change does not alter the
KVConnectorBase_V1interface. - Fully backward-compatible with single-connector use cases.
- Adds necessary safeguards to avoid regression in
MultiConnectorenvironments.
Future Improvements
- Remove current_req and last_req to completely compatible to MultiConnector.
- Improve performance when using MultiConnector in 1p1d scenario.
Alternatives
No response
Additional context
No response