Skip to content

Commit 05f8e89

Browse files
zRzRzRzRzRzRzRwenbinc-Bin
authored andcommitted
self.gate dtype update for GLM-4.5 (vllm-project#22203)
Cherry-pick: vllm-project@6fa41e0 Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
1 parent 445ac29 commit 05f8e89

File tree

3 files changed

+4
-3
lines changed

3 files changed

+4
-3
lines changed

docs/models/supported_models.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -524,7 +524,7 @@ Specified using `--task generate`.
524524
| `GLM4VForCausalLM`<sup>^</sup> | GLM-4V | T + I | `zai-org/glm-4v-9b`, `zai-org/cogagent-9b-20241220` etc. | ✅︎ | ✅︎ | ✅︎ |
525525
| `Glm4vForConditionalGeneration` | GLM-4.1V-Thinking | T + I<sup>E+</sup> + V<sup>E+</sup> | `zai-org/GLM-4.1V-9B-Thinkg`, etc. | ✅︎ | ✅︎ | ✅︎ |
526526
| `Glm4MoeForCausalLM` | GLM-4.5 | T + I<sup>E+</sup> + V<sup>E+</sup> | `zai-org/GLM-4.5`, etc. | ✅︎ | ✅︎ | ✅︎ |
527-
| `Glm4v_moeForConditionalGeneration` | GLM-4.5V | T + I<sup>E+</sup> + V<sup>E+</sup> | `zai-org/GLM-4.5V-Air`, etc. | ✅︎ | ✅︎ | ✅︎ |
527+
| `Glm4v_moeForConditionalGeneration` | GLM-4.5V | T + I<sup>E+</sup> + V<sup>E+</sup> | `zai-org/GLM-4.5V`, etc. | ✅︎ | ✅︎ | ✅︎ |
528528
| `GraniteSpeechForConditionalGeneration` | Granite Speech | T + A | `ibm-granite/granite-speech-3.3-8b` | ✅︎ | ✅︎ | ✅︎ |
529529
| `H2OVLChatModel` | H2OVL | T + I<sup>E+</sup> | `h2oai/h2ovl-mississippi-800m`, `h2oai/h2ovl-mississippi-2b`, etc. | | ✅︎ | ✅︎\* |
530530
| `Idefics3ForConditionalGeneration` | Idefics3 | T + I | `HuggingFaceM4/Idefics3-8B-Llama3` etc. | ✅︎ | | ✅︎ |

tests/models/registry.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -325,7 +325,7 @@ def check_available_online(
325325
trust_remote_code=True,
326326
hf_overrides={"architectures": ["GLM4VForCausalLM"]}), # noqa: E501
327327
"Glm4vForConditionalGeneration": _HfExamplesInfo("zai-org/GLM-4.1V-9B-Thinking"), # noqa: E501
328-
"Glm4v_moeForConditionalGeneration": _HfExamplesInfo("zai-org/GLM-4.5V-Air",
328+
"Glm4v_moeForConditionalGeneration": _HfExamplesInfo("zai-org/GLM-4.5V",
329329
is_available_online=False), # noqa: E501
330330
"H2OVLChatModel": _HfExamplesInfo("h2oai/h2ovl-mississippi-800m",
331331
extras={"2b": "h2oai/h2ovl-mississippi-2b"}, # noqa: E501

vllm/model_executor/models/glm4_moe.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -123,6 +123,7 @@ def __init__(
123123
config.n_routed_experts,
124124
bias=False,
125125
quant_config=None,
126+
params_dtype=torch.float32,
126127
prefix=f"{prefix}.gate")
127128

128129
self.gate.e_score_correction_bias = nn.Parameter(
@@ -180,7 +181,7 @@ def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
180181

181182
if self.n_shared_experts is not None:
182183
shared_output = self.shared_experts(hidden_states)
183-
router_logits, _ = self.gate(hidden_states)
184+
router_logits, _ = self.gate(hidden_states.to(dtype=torch.float32))
184185
final_hidden_states = self.experts(
185186
hidden_states=hidden_states,
186187
router_logits=router_logits) * self.routed_scaling_factor

0 commit comments

Comments
 (0)