-
Notifications
You must be signed in to change notification settings - Fork 609
[refactor]support gatingtopk operator generalization #4050
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[refactor]support gatingtopk operator generalization #4050
Conversation
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the MoE expert selection logic to leverage a more generic npu_moe_gating_top_k operator, which simplifies the code by removing special-cased logic for specific models. The goal is to support all group_count sizes and consolidate functionality. While the refactoring is a good step towards cleaner code, I've found a critical issue where the grouped top-k functionality is unintentionally disabled for softmax-based scoring. My review includes a specific comment and code suggestion to fix this regression.
|
has this change been merged to main branch? |
ecf5a30 to
da5af20
Compare
Signed-off-by: 1092626063 <1092626063@qq.com>
da5af20 to
33832fd
Compare
here is the pr for main branch : #2958 |
|
…tion (vllm-project#4050)" This reverts commit c87a77e. Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
…tion (vllm-project#4050)" This reverts commit c87a77e.
…tion (vllm-project#4050)" This reverts commit c87a77e. Signed-off-by: 1092626063 <1092626063@qq.com>
…lm-project#4050) ### What this PR does / why we need it? pick from : vllm-project#2958 Past: npu_moe_gating_top_k can only support 'group_count=256' pattern Now: 1、npu_moe_gating_top_k support all size of group_count 2、the functionality of `torch_npu.npu_moe_gating_top_k_softmax` are included in `torch_npu.npu_moe_gating_top_k` CANN: depends on 8.3.RC1 Performance: 1. GLM4.5-w8a8, TPS improve 6% 2. Qwen3, the same as before Signed-off-by: 1092626063 <1092626063@qq.com>
…tion (vllm-project#4050)" (vllm-project#4352) This reverts commit c87a77e. it breaks ops e2e test Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: 刘哲续 <liuzhexu1@huawei.com>
What this PR does / why we need it?
Past:
npu_moe_gating_top_k can only support 'group_count=256' pattern
Now:
1、npu_moe_gating_top_k support all size of group_count
2、the functionality of
torch_npu.npu_moe_gating_top_k_softmaxare included intorch_npu.npu_moe_gating_top_kCANN: depends on 8.3.RC1
Performance:
Does this PR introduce any user-facing change?
How was this patch tested?