Skip to content

Conversation

@1092626063
Copy link
Contributor

What this PR does / why we need it?

Past:
npu_moe_gating_top_k can only support 'group_count=256' pattern

Now:
1、npu_moe_gating_top_k support all size of group_count
2、the functionality of torch_npu.npu_moe_gating_top_k_softmax are included in torch_npu.npu_moe_gating_top_k

CANN: depends on 8.3.RC1

Performance:

  1. GLM4.5-w8a8, TPS improve 6%
  2. Qwen3, the same as before

Does this PR introduce any user-facing change?

How was this patch tested?

@github-actions
Copy link

github-actions bot commented Nov 7, 2025

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the MoE expert selection logic to leverage a more generic npu_moe_gating_top_k operator, which simplifies the code by removing special-cased logic for specific models. The goal is to support all group_count sizes and consolidate functionality. While the refactoring is a good step towards cleaner code, I've found a critical issue where the grouped top-k functionality is unintentionally disabled for softmax-based scoring. My review includes a specific comment and code suggestion to fix this regression.

@1092626063 1092626063 changed the title refactor gatingtopk [refactor]support gatingtopk operator generalization Nov 7, 2025
@wangxiyuan
Copy link
Collaborator

has this change been merged to main branch?

Signed-off-by: 1092626063 <1092626063@qq.com>
@1092626063
Copy link
Contributor Author

has this change been merged to main branch?

here is the pr for main branch : #2958

@1092626063
Copy link
Contributor Author

has this change been merged to main branch?

This pr is cherry-pick from : #2958

@wangxiyuan wangxiyuan merged commit c87a77e into vllm-project:v0.11.0-dev Nov 19, 2025
16 checks passed
wangxiyuan added a commit to wangxiyuan/vllm-ascend that referenced this pull request Nov 21, 2025
…tion (vllm-project#4050)"

This reverts commit c87a77e.

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
1092626063 added a commit to 1092626063/vllm-ascend that referenced this pull request Nov 21, 2025
1092626063 added a commit to 1092626063/vllm-ascend that referenced this pull request Nov 21, 2025
…tion (vllm-project#4050)"

This reverts commit c87a77e.

Signed-off-by: 1092626063 <1092626063@qq.com>
wangxiyuan added a commit that referenced this pull request Nov 21, 2025
…tion (#4050)" (#4352)

This reverts commit c87a77e.

it breaks ops e2e test

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
1092626063 added a commit to 1092626063/vllm-ascend that referenced this pull request Nov 22, 2025
…lm-project#4050)

### What this PR does / why we need it?
pick from : vllm-project#2958
Past:
npu_moe_gating_top_k can only support 'group_count=256' pattern

Now:
1、npu_moe_gating_top_k support all size of group_count
2、the functionality of `torch_npu.npu_moe_gating_top_k_softmax` are
included in `torch_npu.npu_moe_gating_top_k`

CANN: depends on 8.3.RC1

Performance:
1. GLM4.5-w8a8, TPS improve 6%
2. Qwen3, the same as before


Signed-off-by: 1092626063 <1092626063@qq.com>
henryxuxu0716 pushed a commit to henryxuxu0716/vllm-ascend that referenced this pull request Nov 27, 2025
…tion (vllm-project#4050)" (vllm-project#4352)

This reverts commit c87a77e.

it breaks ops e2e test

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: 刘哲续 <liuzhexu1@huawei.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants