[BugFix] Fixes Qwen3-Next enable nz accuracy problem #4058

wxsIcey · 2025-11-07T08:21:54Z

What this PR does / why we need it?

Fixes Qwen3-Next enable nz accuracy problem

Does this PR introduce any user-facing change?

N/A

How was this patch tested?

def main():
    prompts = [
        "窗前明月光，",
        "The president of the United States is Mr.",
        "The capital of France is",
        "The future of AI is",
        "感时花溅泪，",
        "家书抵万金啥意思？",
        "plz tell me a story: ",
    ]

    # Create a sampling params object.
    sampling_params = SamplingParams(max_tokens=100, temperature=0.6, top_k=40, top_p=0.95)
    # Create an LLM.
    llm = LLM(
        # model="/root/.cache/modelscope/hub/models/Qwen/Qwen3-30B-A3B",
        model="Qwen/Qwen3-Next-80B-A3B-Instruct",
              tensor_parallel_size=4,
              enforce_eager=True,
              trust_remote_code=True,
              max_model_len=256,
              # max_num_seqs=2,
              gpu_memory_utilization=0.7,
              block_size=64,
              )

    # Generate texts from the prompts.
    outputs = llm.generate(prompts, sampling_params)
    for output in outputs:
        prompt = output.prompt
        generated_text = output.outputs[0].text
        print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

vLLM version: v0.11.0
vLLM main: vllm-project/vllm@83f478b

Signed-off-by: Icey <1790571317@qq.com>

github-actions · 2025-11-07T08:22:04Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request addresses an accuracy issue with the Qwen3-Next model when the NZ format is enabled. The fix involves explicitly casting the conv_weights tensor to the ND format to disable the NZ format optimization. The change is straightforward and seems to be an effective workaround. My review includes a suggestion to improve the code's readability and maintainability by replacing a magic number with a named constant and clarifying the associated comment.

gemini-code-assist · 2025-11-07T08:23:41Z

vllm_ascend/models/qwen3_next.py

+                # qwen3-next enable nz has accuray problems, so disable it here
+                torch_npu.npu_format_cast(conv_weights, 2),


The comment has a typo ("accuray") and could be more descriptive. Also, the magic number 2 is used, which refers to ACL_FORMAT_FRACTAL_ND. It would be best to import and use the constant from vllm_ascend.utils for better readability and maintainability.

Since I cannot add the import myself, I've included the constant name in the comment within the suggestion. Please consider importing and using the constant.

Suggested change

# qwen3-next enable nz has accuray problems, so disable it here

torch_npu.npu_format_cast(conv_weights, 2),

# Workaround for Qwen3-Next accuracy issue with NZ format by casting to ND (ACL_FORMAT_FRACTAL_ND).

torch_npu.npu_format_cast(conv_weights, 2),

Signed-off-by: Icey <1790571317@qq.com>

[BugFix] Fixes Qwen3-Next enable nz accuracy problem

e201414

Signed-off-by: Icey <1790571317@qq.com>

wxsIcey added ready read for review ready-for-test start test by label for PR labels Nov 7, 2025

gemini-code-assist bot reviewed Nov 7, 2025

View reviewed changes

fix

bed7ab6

Signed-off-by: Icey <1790571317@qq.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BugFix] Fixes Qwen3-Next enable nz accuracy problem #4058

[BugFix] Fixes Qwen3-Next enable nz accuracy problem #4058

wxsIcey commented Nov 7, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Nov 7, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		# qwen3-next enable nz has accuray problems, so disable it here
		torch_npu.npu_format_cast(conv_weights, 2),

[BugFix] Fixes Qwen3-Next enable nz accuracy problem #4058

Are you sure you want to change the base?

[BugFix] Fixes Qwen3-Next enable nz accuracy problem #4058

Conversation

wxsIcey commented Nov 7, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Nov 7, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

wxsIcey commented Nov 7, 2025 •

edited by github-actions bot

Loading