-
Notifications
You must be signed in to change notification settings - Fork 660
[PD Disaggregation][XPU] Add XPU support for PD disaggregation #5113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for your contribution! |
|
ddchenhao66 seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds XPU hardware support for PD (Prefill-Decode) disaggregation feature, which allows splitting prefill and decode phases across different compute nodes. The implementation mirrors existing CUDA/GPU support by adding XPU-specific code paths throughout the attention, cache management, and worker execution layers.
- Extends attention backends to support XPU with PD disaggregation modes ("per_chunk" and "per_query")
- Adds XPU signal handling and inter-process communication for KV cache coordination
- Refactors platform-specific operation imports to support both CUDA and XPU through a unified interface
- Updates cache manager to handle XPU-specific memory addressing and device visibility
Reviewed Changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| fastdeploy/worker/xpu_model_runner.py | Adds PD disaggregation mode support in XPU worker, including decode node handling and KV signal sender lifecycle management |
| fastdeploy/model_executor/layers/attention/xpu_attn_backend.py | Implements XPU attention backend with PD disaggregation initialization and signal handling |
| fastdeploy/model_executor/layers/attention/utils.py | Adds XPU device visibility environment variable support (XPU_VISIBLE_DEVICES) |
| fastdeploy/model_executor/forward_meta.py | Adds kv_signal_sender field to XPUForwardMeta for PD disaggregation |
| fastdeploy/model_executor/layers/attention/ops/*.py | Adds XPU platform branches to signal operation wrappers |
| fastdeploy/cache_manager/*.py | Refactors ops imports to platform-agnostic interface and adds XPU memory address handling |
| custom_ops/xpu_ops/src/ops/*.cc | Implements XPU-specific C++ operations for signal handling and shared memory |
hong19860320
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
5b52a51
Motivation
XPU支持PD分离
Modifications
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.