[0.11.0] Fixes Qwen3-Next enable nz accuracy problem #4056

wxsIcey · 2025-11-07T07:39:22Z

What this PR does / why we need it?

Fixes Qwen3-Next enable nz accuracy problem

Does this PR introduce any user-facing change?

N/A

How was this patch tested?

def main():
    prompts = [
        "窗前明月光，",
        "The president of the United States is Mr.",
        "The capital of France is",
        "The future of AI is",
        "感时花溅泪，",
        "家书抵万金啥意思？",
        "plz tell me a story: ",
    ]

    # Create a sampling params object.
    sampling_params = SamplingParams(max_tokens=100, temperature=0.6, top_k=40, top_p=0.95)
    # Create an LLM.
    llm = LLM(
        model="/root/.cache/modelscope/hub/models/Qwen/Qwen3-Next-80B-A3B-Instruct",
              tensor_parallel_size=4,
              enforce_eager=True,
              trust_remote_code=True,
              max_model_len=256,
              gpu_memory_utilization=0.7,
              block_size=64,
              )

    # Generate texts from the prompts.
    outputs = llm.generate(prompts, sampling_params)
    for output in outputs:
        prompt = output.prompt
        generated_text = output.outputs[0].text
        print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

Signed-off-by: wxsIcey <1790571317@qq.com>

github-actions · 2025-11-07T07:39:33Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request addresses an accuracy issue in Qwen3-Next models when the 'nz' tensor format is enabled. The fix involves casting the convolution weights to a supported format before passing them to the causal_conv1d_update function. While the change is functionally correct, it introduces a magic number for the format cast, which harms code readability and maintainability. My review suggests replacing this magic number with a named constant, which should be imported from vllm_ascend.utils, and improving the related code comment for better clarity.

gemini-code-assist · 2025-11-07T07:41:11Z

vllm_ascend/models/qwen3_next.py

+                # qwen3-next not support nz
+                torch_npu.npu_format_cast(conv_weights, 2),


The magic number 2 is used here for torch_npu.npu_format_cast. This makes the code hard to read and maintain. It's better to use a named constant. Based on other files in this repository, it seems 2 corresponds to ACL_FORMAT_FRACTAL_ND.

Please add from vllm_ascend.utils import ACL_FORMAT_FRACTAL_ND at the top of the file and use the constant here.

Also, the comment could be more descriptive about why this cast is necessary. I've included an improved comment in the suggestion.

Suggested change

# qwen3-next not support nz

torch_npu.npu_format_cast(conv_weights, 2),

# The causal_conv1d_update kernel does not support the NZ format for conv_weights

# with Qwen3-Next models. Casting to the ND format to ensure correctness.

torch_npu.npu_format_cast(conv_weights, ACL_FORMAT_FRACTAL_ND),

[0.11.0] Fix Qwen3-Next enable nz accuracy problem

ea900bd

Signed-off-by: wxsIcey <1790571317@qq.com>

wxsIcey changed the title ~~[0.11.0] Fixex Qwen3-Next enable nz accuracy problem~~ [0.11.0] Fixes Qwen3-Next enable nz accuracy problem Nov 7, 2025

gemini-code-assist bot reviewed Nov 7, 2025

View reviewed changes

wxsIcey added ready read for review ready-for-test start test by label for PR labels Nov 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[0.11.0] Fixes Qwen3-Next enable nz accuracy problem #4056

[0.11.0] Fixes Qwen3-Next enable nz accuracy problem #4056

wxsIcey commented Nov 7, 2025

Uh oh!

github-actions bot commented Nov 7, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		# qwen3-next not support nz
		torch_npu.npu_format_cast(conv_weights, 2),

-                # qwen3-next not support nz
-                torch_npu.npu_format_cast(conv_weights, 2),
+                # The causal_conv1d_update kernel does not support the NZ format for conv_weights
+                # with Qwen3-Next models. Casting to the ND format to ensure correctness.
+                torch_npu.npu_format_cast(conv_weights, ACL_FORMAT_FRACTAL_ND),

[0.11.0] Fixes Qwen3-Next enable nz accuracy problem #4056

Are you sure you want to change the base?

[0.11.0] Fixes Qwen3-Next enable nz accuracy problem #4056

Conversation

wxsIcey commented Nov 7, 2025

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Nov 7, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant