Skip to content

Commit 6e982f4

Browse files
sywangyiNarsil
andauthored
fix the crash of meta-llama/Llama-3.2-1B (#2918)
* fix the crash of meta-llama/Llama-3.2-1B Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * Apply suggestions from code review Simpler fix (which doesn't break vlms). --------- Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
1 parent c20025d commit 6e982f4

File tree

1 file changed

+1
-3
lines changed

1 file changed

+1
-3
lines changed

server/text_generation_server/models/custom_modeling/flash_llama_modeling.py

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -642,9 +642,7 @@ def __init__(self, prefix: str, config, weights, name=None):
642642
embedding_multiplier = getattr(config, "embedding_multiplier", None)
643643
if embedding_multiplier is not None:
644644
self.embed_tokens.weight.data *= embedding_multiplier
645-
646-
prefix = "lm_head" if not prefix or name != "model" else f"{prefix}.{suffix}"
647-
645+
prefix = suffix if not prefix or name != "model" else f"{prefix}.{suffix}"
648646
with no_fp8(weights):
649647
self.lm_head = SpeculativeHead.load(
650648
config,

0 commit comments

Comments
 (0)