[refactor]support gatingtopk operator generalization #4050

1092626063 · 2025-11-07T06:30:00Z

What this PR does / why we need it?

Past：
npu_moe_gating_top_k can only support 'group_count=256' pattern

Now：
1、npu_moe_gating_top_k support all size of group_count
2、the functionality of torch_npu.npu_moe_gating_top_k_softmax are included in torch_npu.npu_moe_gating_top_k

CANN: depends on 8.3.RC1

Performance：

GLM4.5-w8a8, TPS improve 6%
Qwen3, the same as before

Does this PR introduce any user-facing change?

How was this patch tested?

github-actions · 2025-11-07T06:30:08Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request refactors the MoE expert selection logic to leverage a more generic npu_moe_gating_top_k operator, which simplifies the code by removing special-cased logic for specific models. The goal is to support all group_count sizes and consolidate functionality. While the refactoring is a good step towards cleaner code, I've found a critical issue where the grouped top-k functionality is unintentionally disabled for softmax-based scoring. My review includes a specific comment and code suggestion to fix this regression.

vllm_ascend/ops/moe/experts_selector.py

wangxiyuan · 2025-11-10T03:56:09Z

has this change been merged to main branch?

Signed-off-by: 1092626063 <1092626063@qq.com>

1092626063 · 2025-11-10T11:06:44Z

has this change been merged to main branch?

here is the pr for main branch : #2958

1092626063 · 2025-11-13T09:05:21Z

has this change been merged to main branch?

This pr is cherry-pick from : #2958

…tion (vllm-project#4050)" This reverts commit c87a77e. Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

…tion (vllm-project#4050)" This reverts commit c87a77e.

…tion (vllm-project#4050)" This reverts commit c87a77e. Signed-off-by: 1092626063 <1092626063@qq.com>

…tion (#4050)" (#4352) This reverts commit c87a77e. it breaks ops e2e test Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

…lm-project#4050) ### What this PR does / why we need it? pick from : vllm-project#2958 Past： npu_moe_gating_top_k can only support 'group_count=256' pattern Now： 1、npu_moe_gating_top_k support all size of group_count 2、the functionality of `torch_npu.npu_moe_gating_top_k_softmax` are included in `torch_npu.npu_moe_gating_top_k` CANN: depends on 8.3.RC1 Performance： 1. GLM4.5-w8a8, TPS improve 6% 2. Qwen3, the same as before Signed-off-by: 1092626063 <1092626063@qq.com>

…tion (vllm-project#4050)" (vllm-project#4352) This reverts commit c87a77e. it breaks ops e2e test Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: 刘哲续 <liuzhexu1@huawei.com>

github-actions bot added module:ops module:core labels Nov 7, 2025

gemini-code-assist bot reviewed Nov 7, 2025

View reviewed changes

vllm_ascend/ops/moe/experts_selector.py Show resolved Hide resolved

1092626063 changed the title ~~refactor gatingtopk~~ [refactor]support gatingtopk operator generalization Nov 7, 2025

1092626063 force-pushed the gatingtopk0110 branch from ecf5a30 to da5af20 Compare November 10, 2025 07:01

github-actions bot added the module:tests label Nov 10, 2025

refactor gatingtopk

33832fd

Signed-off-by: 1092626063 <1092626063@qq.com>

1092626063 force-pushed the gatingtopk0110 branch from da5af20 to 33832fd Compare November 10, 2025 08:18

wangxiyuan merged commit c87a77e into vllm-project:v0.11.0-dev Nov 19, 2025
16 checks passed

wangxiyuan added a commit to wangxiyuan/vllm-ascend that referenced this pull request Nov 21, 2025

Revert "[cherry-pick][refactor]support gatingtopk operator generaliza…

f4e7bde

…tion (vllm-project#4050)" This reverts commit c87a77e. Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

1092626063 added a commit to 1092626063/vllm-ascend that referenced this pull request Nov 21, 2025

Revert "[cherry-pick][refactor]support gatingtopk operator generaliza…

c16f78f

…tion (vllm-project#4050)" This reverts commit c87a77e.

1092626063 mentioned this pull request Nov 21, 2025

Revert "[cherry-pick][refactor]support gatingtopk operator generaliza… #4353

Closed

1092626063 added a commit to 1092626063/vllm-ascend that referenced this pull request Nov 21, 2025

Revert "[cherry-pick][refactor]support gatingtopk operator generaliza…

59b309a

…tion (vllm-project#4050)" This reverts commit c87a77e. Signed-off-by: 1092626063 <1092626063@qq.com>

wangxiyuan added a commit that referenced this pull request Nov 21, 2025

Revert "[cherry-pick][refactor]support gatingtopk operator generaliza…

a2e4c3f

…tion (#4050)" (#4352) This reverts commit c87a77e. it breaks ops e2e test Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[refactor]support gatingtopk operator generalization #4050

[refactor]support gatingtopk operator generalization #4050

1092626063 commented Nov 7, 2025

Uh oh!

github-actions bot commented Nov 7, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

wangxiyuan commented Nov 10, 2025

Uh oh!

1092626063 commented Nov 10, 2025

Uh oh!

1092626063 commented Nov 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[refactor]support gatingtopk operator generalization #4050

[refactor]support gatingtopk operator generalization #4050

Conversation

1092626063 commented Nov 7, 2025

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Nov 7, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

wangxiyuan commented Nov 10, 2025

Uh oh!

1092626063 commented Nov 10, 2025

Uh oh!

1092626063 commented Nov 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants