Skip to content

Conversation

@wxsIcey
Copy link
Collaborator

@wxsIcey wxsIcey commented Nov 7, 2025

What this PR does / why we need it?

  • Fixes Qwen3-Next enable nz accuracy problem

Does this PR introduce any user-facing change?

N/A

How was this patch tested?

def main():
    prompts = [
        "窗前明月光,",
        "The president of the United States is Mr.",
        "The capital of France is",
        "The future of AI is",
        "感时花溅泪,",
        "家书抵万金啥意思?",
        "plz tell me a story: ",
    ]

    # Create a sampling params object.
    sampling_params = SamplingParams(max_tokens=100, temperature=0.6, top_k=40, top_p=0.95)
    # Create an LLM.
    llm = LLM(
        model="/root/.cache/modelscope/hub/models/Qwen/Qwen3-Next-80B-A3B-Instruct",
              tensor_parallel_size=4,
              enforce_eager=True,
              trust_remote_code=True,
              max_model_len=256,
              gpu_memory_utilization=0.7,
              block_size=64,
              )

    # Generate texts from the prompts.
    outputs = llm.generate(prompts, sampling_params)
    for output in outputs:
        prompt = output.prompt
        generated_text = output.outputs[0].text
        print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

Signed-off-by: wxsIcey <1790571317@qq.com>
@wxsIcey wxsIcey changed the title [0.11.0] Fixex Qwen3-Next enable nz accuracy problem [0.11.0] Fixes Qwen3-Next enable nz accuracy problem Nov 7, 2025
@github-actions
Copy link

github-actions bot commented Nov 7, 2025

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses an accuracy issue in Qwen3-Next models when the 'nz' tensor format is enabled. The fix involves casting the convolution weights to a supported format before passing them to the causal_conv1d_update function. While the change is functionally correct, it introduces a magic number for the format cast, which harms code readability and maintainability. My review suggests replacing this magic number with a named constant, which should be imported from vllm_ascend.utils, and improving the related code comment for better clarity.

Comment on lines +270 to +271
# qwen3-next not support nz
torch_npu.npu_format_cast(conv_weights, 2),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The magic number 2 is used here for torch_npu.npu_format_cast. This makes the code hard to read and maintain. It's better to use a named constant. Based on other files in this repository, it seems 2 corresponds to ACL_FORMAT_FRACTAL_ND.

Please add from vllm_ascend.utils import ACL_FORMAT_FRACTAL_ND at the top of the file and use the constant here.

Also, the comment could be more descriptive about why this cast is necessary. I've included an improved comment in the suggestion.

Suggested change
# qwen3-next not support nz
torch_npu.npu_format_cast(conv_weights, 2),
# The causal_conv1d_update kernel does not support the NZ format for conv_weights
# with Qwen3-Next models. Casting to the ND format to ensure correctness.
torch_npu.npu_format_cast(conv_weights, ACL_FORMAT_FRACTAL_ND),

@wxsIcey wxsIcey added ready read for review ready-for-test start test by label for PR labels Nov 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant