Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions vllm_ascend/quantization/quant_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,17 @@ def is_layer_skipped_ascend(
f"Detected some but not all shards of {prefix} "
"are quantized. All shards of fused layers "
"to have the same precision.")
elif "experts" in prefix:
# For the experts' prefix (e.g., "model.layers.3.mlp.experts")
# Assume all experts within the same MLP use the same quantization method
experts_quant_description = [
self.quant_description[layer]
for layer in self.quant_description if prefix in layer
]
is_skipped = any(
quantization == "FLOAT"
for quantization in experts_quant_description
)
Comment on lines +163 to +170
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This logic can be improved for correctness and robustness:

  1. Correctness: Using prefix in layer is less precise than layer.startswith(prefix + '.') for hierarchical module names. It could lead to incorrect matches if another module name contains the expert prefix as a substring.
  2. Robustness: If no expert layers are found for the given prefix, experts_quant_description will be an empty list. any() on an empty list returns False, causing is_skipped to be False. This will likely lead to a RuntimeError later in get_linear_quant_type. It's safer to treat this case as skipped (unquantized).

Here is a suggested improvement that addresses these points.

Suggested change
experts_quant_description = [
self.quant_description[layer]
for layer in self.quant_description if prefix in layer
]
is_skipped = any(
quantization == "FLOAT"
for quantization in experts_quant_description
)
experts_quant_description = [
self.quant_description[layer]
for layer in self.quant_description if layer.startswith(prefix + '.')
]
if not experts_quant_description:
is_skipped = True
else:
is_skipped = any(
quantization == "FLOAT"
for quantization in experts_quant_description)

else:
is_skipped = self.quant_description[prefix + '.weight'] == "FLOAT"

Expand Down
10 changes: 10 additions & 0 deletions vllm_ascend/quantization/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,16 @@ def get_linear_quant_type(quant_description: Dict[str, Any], prefix: str,
f"Not all shards of {prefix} are quantized with same quant type."
f"Shard {proj_name} uses {shard_quant_type}, but another shard"
f"use {quant_type}. Please check quantization config.")
elif "experts" in prefix:
# For the experts' prefix (e.g., "model.layers.3.mlp.experts")
# Assume all experts within the same MLP use the same quantization method
experts_quant_description = set(
quant_description[layer]
for layer in quant_description if prefix in layer
)
Comment on lines +58 to +61
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using prefix in layer is less precise than layer.startswith(prefix + '.') for hierarchical module names. It could lead to incorrect matches if another module name contains the expert prefix as a substring. Using startswith is safer and more correct. Additionally, using a set comprehension {...} is more idiomatic than set(...).

Suggested change
experts_quant_description = set(
quant_description[layer]
for layer in quant_description if prefix in layer
)
experts_quant_description = {
quant_description[layer]
for layer in quant_description if layer.startswith(prefix + '.')
}

if not len(experts_quant_description) == 1:
raise RuntimeError(f"{prefix} has different quantization type: {experts_quant_description}.")
quant_type = experts_quant_description.pop()
else:
quant_type = quant_description[prefix + '.weight']
return quant_type
Expand Down
Loading