Skip to content

Conversation

@wxsIcey
Copy link
Collaborator

@wxsIcey wxsIcey commented Nov 7, 2025

What this PR does / why we need it?

  • Fixes Qwen3-Next enable nz accuracy problem

Does this PR introduce any user-facing change?

N/A

How was this patch tested?

def main():
    prompts = [
        "窗前明月光,",
        "The president of the United States is Mr.",
        "The capital of France is",
        "The future of AI is",
        "感时花溅泪,",
        "家书抵万金啥意思?",
        "plz tell me a story: ",
    ]

    # Create a sampling params object.
    sampling_params = SamplingParams(max_tokens=100, temperature=0.6, top_k=40, top_p=0.95)
    # Create an LLM.
    llm = LLM(
        # model="/root/.cache/modelscope/hub/models/Qwen/Qwen3-30B-A3B",
        model="Qwen/Qwen3-Next-80B-A3B-Instruct",
              tensor_parallel_size=4,
              enforce_eager=True,
              trust_remote_code=True,
              max_model_len=256,
              # max_num_seqs=2,
              gpu_memory_utilization=0.7,
              block_size=64,
              )

    # Generate texts from the prompts.
    outputs = llm.generate(prompts, sampling_params)
    for output in outputs:
        prompt = output.prompt
        generated_text = output.outputs[0].text
        print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

Signed-off-by: Icey <1790571317@qq.com>
@github-actions
Copy link

github-actions bot commented Nov 7, 2025

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@wxsIcey wxsIcey added ready read for review ready-for-test start test by label for PR labels Nov 7, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses an accuracy issue with the Qwen3-Next model when the NZ format is enabled. The fix involves explicitly casting the conv_weights tensor to the ND format to disable the NZ format optimization. The change is straightforward and seems to be an effective workaround. My review includes a suggestion to improve the code's readability and maintainability by replacing a magic number with a named constant and clarifying the associated comment.

Comment on lines 282 to 283
# qwen3-next enable nz has accuray problems, so disable it here
torch_npu.npu_format_cast(conv_weights, 2),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The comment has a typo ("accuray") and could be more descriptive. Also, the magic number 2 is used, which refers to ACL_FORMAT_FRACTAL_ND. It would be best to import and use the constant from vllm_ascend.utils for better readability and maintainability.

Since I cannot add the import myself, I've included the constant name in the comment within the suggestion. Please consider importing and using the constant.

Suggested change
# qwen3-next enable nz has accuray problems, so disable it here
torch_npu.npu_format_cast(conv_weights, 2),
# Workaround for Qwen3-Next accuracy issue with NZ format by casting to ND (ACL_FORMAT_FRACTAL_ND).
torch_npu.npu_format_cast(conv_weights, 2),

Signed-off-by: Icey <1790571317@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant