[Feature]: Change dump kv blocks to request level and suppot multi connector

### 🚀 The feature, motivation and pitch

# Motivation
In previous v0.9.2 version, our uc connector dispatched dump tasks and commits at the block level. The advantage of this approach is that it allows for finer-grained control over which KV blocks are successfully offloaded. However,  it introduced several issues:

- **Coupling to vLLM internal fields:** Tracking block-level dump success required patching vLLM source code to access and record per-block status. Without such patches, the UC Connector would alter the semantics of internal request fields (e.g., `succeed_dumped_blocks`), which is risky and potentially incompatible with upstream.
- **Performance overhead:** Block-level granularity led to high task count and degraded connector throughput due to overhead in managing individual block tasks.
- **Incorrect commits in multi-connector setups:** In setups using `MultiKVConnector`, repeated callbacks for already-finished request IDs across different connectors caused UC to incorrectly commit dumped blocks as failed (false).


To address these concerns, starting from vLLM version 0.11.0, I proposed and implemented switching to **request-level dispatch** of dump and commit tasks, while ensuring compatibility with multi-connector topologies.

---
# What might be changed

## 1. **Request-Level Dump & Commit Dispatch**
- Instead of looping over each block and issuing dump calls per block, the connector now batches all blocks associated with a request and dispatches **one task per request** to the underlying connector.
- The `dump_tasks` structure changes from `{req_id: {block_id: [task]}}` to `{req_id: [task]}`.

## 2. **Request-Level Commit Handling**
- The UC connector now commits **all blocks in a request at once**, based on whether the entire request's tasks completed successfully.
- `self.success_reqs` and `self.failed_reqs` sets track the request-level status.
- Block-wise commit calls are aggregated from the `BlockInfo` structure if a request finishes.

## 3. **MultiConnector Compatibility**
- Introduced `self.current_req` and `self.last_req` to track the most recent and previous request during scheduling.
- These fields help the UC connector **avoid false negatives** when other connectors redundantly report already-finished request IDs, which would otherwise cause UC to prematurely `commit(..., False)` on valid blocks.

---

# Files To Be Modified

- `ucm/integration/vllm/uc_connector.py` — Core logic changed to enable request-level granularity for `dump`, `wait`, and `commit`.

---

# Internal Logic Highlights

| Area                     | Before                                     | After                                     |
|--------------------------|--------------------------------------------|--------------------------------------------|
| `dump_tasks` structure   | Nested per block                           | Flat per request: `Dict[str, List[Task]]`  |
| Dump dispatch granularity| Per block                                  | Per request                                |
| Commit point             | Based on `succeed_dumped_blocks` field     | Based on `success_reqs` & `finished_reqs` tracking |
| `wait_for_save()` logic  | Waits per block, tracks block success      | Waits per request, tracks full success/failure |
| Multi connector support  | None (false commits possible)              | Uses `last_req` + `current_req` to disambiguate |

---
# Compatibility Notes

- This change does not alter the `KVConnectorBase_V1` interface.
- Fully backward-compatible with single-connector use cases.
- Adds necessary safeguards to avoid regression in `MultiConnector` environments.

---

# Future Improvements

- Remove current_req and last_req to completely compatible to MultiConnector.
- Improve performance when using MultiConnector in 1p1d scenario.

### Alternatives

_No response_

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature]: Change dump kv blocks to request level and suppot multi connector #331

🚀 The feature, motivation and pitch

Motivation

What might be changed

1. Request-Level Dump & Commit Dispatch

2. Request-Level Commit Handling

3. MultiConnector Compatibility

Files To Be Modified

Internal Logic Highlights

Compatibility Notes

Future Improvements

Alternatives

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Area	Before	After
`dump_tasks` structure	Nested per block	Flat per request: `Dict[str, List[Task]]`
Dump dispatch granularity	Per block	Per request
Commit point	Based on `succeed_dumped_blocks` field	Based on `success_reqs` & `finished_reqs` tracking
`wait_for_save()` logic	Waits per block, tracks block success	Waits per request, tracks full success/failure
Multi connector support	None (false commits possible)	Uses `last_req` + `current_req` to disambiguate

[Feature]: Change dump kv blocks to request level and suppot multi connector #331

Description

🚀 The feature, motivation and pitch

Motivation

What might be changed

1. Request-Level Dump & Commit Dispatch

2. Request-Level Commit Handling

3. MultiConnector Compatibility

Files To Be Modified

Internal Logic Highlights

Compatibility Notes

Future Improvements

Alternatives

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions