Skip to content

Conversation

@yangqinghao-cmss
Copy link
Contributor

What this PR does / why we need it?

When retrieving the quantization method for MOE (e.g., the quantization file of DeepSeek v3.2 exp do not match the model's naming convention in eager mode), a KeyError is raised: "model.layers.3.mlp.experts.weight not in self.quant_description". However the quantization file is like :

  "model.layers.3.mlp.experts.255.gate_proj.weight": "W8A8_DYNAMIC",
  "model.layers.3.mlp.experts.255.gate_proj.weight_scale": "W8A8_DYNAMIC",
  "model.layers.3.mlp.experts.255.gate_proj.weight_offset": "W8A8_DYNAMIC",
  "model.layers.3.mlp.experts.255.down_proj.weight": "W8A8_DYNAMIC",
  "model.layers.3.mlp.experts.255.down_proj.weight_scale": "W8A8_DYNAMIC",
  "model.layers.3.mlp.experts.255.down_proj.weight_offset": "W8A8_DYNAMIC",
  "model.layers.3.mlp.experts.255.up_proj.weight": "W8A8_DYNAMIC",
  "model.layers.3.mlp.experts.255.up_proj.weight_scale": "W8A8_DYNAMIC",
  "model.layers.3.mlp.experts.255.up_proj.weight_offset": "W8A8_DYNAMIC",

Does this PR introduce any user-facing change?

How was this patch tested?

vllm-ascend: 0.11.0rc0

…experts (e.g., DeepSeek_v3.2_exp w8a8)

Signed-off-by: yangqinghao-cmss <yangqinghao_yewu@cmss.chinamobile.com>
@github-actions
Copy link

github-actions bot commented Nov 6, 2025

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a KeyError that occurs when retrieving quantization methods for Mixture-of-Experts (MoE) layers, where quantization configurations are defined on a per-expert basis. The changes correctly handle this by iterating through expert-specific keys. My review includes two high-severity comments to enhance the correctness and robustness of the implementation. I've suggested using startswith for more precise prefix matching and adding checks for cases where no expert layers are found to prevent potential runtime errors.

Comment on lines +163 to +170
experts_quant_description = [
self.quant_description[layer]
for layer in self.quant_description if prefix in layer
]
is_skipped = any(
quantization == "FLOAT"
for quantization in experts_quant_description
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This logic can be improved for correctness and robustness:

  1. Correctness: Using prefix in layer is less precise than layer.startswith(prefix + '.') for hierarchical module names. It could lead to incorrect matches if another module name contains the expert prefix as a substring.
  2. Robustness: If no expert layers are found for the given prefix, experts_quant_description will be an empty list. any() on an empty list returns False, causing is_skipped to be False. This will likely lead to a RuntimeError later in get_linear_quant_type. It's safer to treat this case as skipped (unquantized).

Here is a suggested improvement that addresses these points.

Suggested change
experts_quant_description = [
self.quant_description[layer]
for layer in self.quant_description if prefix in layer
]
is_skipped = any(
quantization == "FLOAT"
for quantization in experts_quant_description
)
experts_quant_description = [
self.quant_description[layer]
for layer in self.quant_description if layer.startswith(prefix + '.')
]
if not experts_quant_description:
is_skipped = True
else:
is_skipped = any(
quantization == "FLOAT"
for quantization in experts_quant_description)

Comment on lines +58 to +61
experts_quant_description = set(
quant_description[layer]
for layer in quant_description if prefix in layer
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using prefix in layer is less precise than layer.startswith(prefix + '.') for hierarchical module names. It could lead to incorrect matches if another module name contains the expert prefix as a substring. Using startswith is safer and more correct. Additionally, using a set comprehension {...} is more idiomatic than set(...).

Suggested change
experts_quant_description = set(
quant_description[layer]
for layer in quant_description if prefix in layer
)
experts_quant_description = {
quant_description[layer]
for layer in quant_description if layer.startswith(prefix + '.')
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant