-
Notifications
You must be signed in to change notification settings - Fork 659
[XPU] [Optimization] [EP] EP communication optimization. #5145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
|
Thanks for your contribution! |
| if_only_decode = self.only_decode() | ||
| if ( | ||
| self.fd_config.scheduler_config.splitwise_role == "mixed" | ||
| ): # 集中式场景,phase默认初始化为prefill, 推理运行时不同类型的batch能够在此处实现phase切换 | ||
| self.fd_config.model_config.moe_phase.phase = "decode" if if_only_decode else "prefill" | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only_decoder=self.forward_meta.len_info_cpu[0]<=0
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #5145 +/- ##
==========================================
Coverage ? 59.73%
==========================================
Files ? 317
Lines ? 38682
Branches ? 5813
==========================================
Hits ? 23105
Misses ? 13746
Partials ? 1831
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
c113124 to
5d4a7f5
Compare
Motivation
Implement low-latency version communication operators for pure D requests, and high-throughput version communication operators for P requests in centralized inference scenarios.
Modifications
Usage or Command
export MOE_FFN_USE_DENSE_INPUT=1
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.