-
Notifications
You must be signed in to change notification settings - Fork 14k
mtmd: add Eagle2-VL vision and projector support #17360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
mtmd: add Eagle2-VL vision and projector support #17360
Conversation
ngxson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explicitly confirm if part of the PR is generated by AI? I feel very suspicious about some redundant code
While you said in the PR description that you tested it, you haven't even mentioned the link to the model, as well as how you tested it.
tools/mtmd/clip.cpp
Outdated
| learned_pos_embd, | ||
| nullptr); | ||
|
|
||
| // keep runtime quiet in normal runs; shapes are correct by construction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some indentations seem off here
tools/mtmd/clip.cpp
Outdated
| if (model.mm_0_b) { | ||
| embeddings = ggml_add(ctx0, embeddings, model.mm_0_b); | ||
| } | ||
|
|
||
| embeddings = ggml_gelu(ctx0, embeddings); | ||
|
|
||
| GGML_ASSERT(model.mm_2_w != nullptr); | ||
| // keep [n_in, n_tokens] layout for the second matmul as well | ||
| embeddings = ggml_reshape_2d(ctx0, embeddings, embeddings->ne[0], embeddings->ne[1]); | ||
| embeddings = ggml_cont_2d(ctx0, embeddings, embeddings->ne[0], embeddings->ne[1]); | ||
| // Weights are canonicalized at conversion time to [n_in, n_out]; multiply directly. | ||
| embeddings = ggml_mul_mat(ctx0, model.mm_2_w, embeddings); | ||
| if (model.mm_2_b) { | ||
| embeddings = ggml_add(ctx0, embeddings, model.mm_2_b); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better replacing this whole block with build_ffn
| mlp_pos = name.find("mlp1.") | ||
| if mlp_pos != -1: | ||
| mlp_suffix = name[mlp_pos + len("mlp1."):] | ||
| # Skip LayerNorm (mlp1.0.*) | ||
| if mlp_suffix.startswith("0."): | ||
| return [] | ||
| # Map first Linear (mlp1.1.*) -> mm.0.* | ||
| if mlp_suffix.startswith("1."): | ||
| new_name = "mm.0." + mlp_suffix[2:] | ||
| if new_name.endswith(".weight"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all of these code are redundant. This model: https://huggingface.co/nvidia/Eagle2-1B has simple .mlp.fc1 and .mlp.fc2 MLP, there is no nesting mlp1.1.* as you described
| ] | ||
|
|
||
| # 5) Conv3D patch embed -> two Conv2D kernels | ||
| if name.endswith("patch_embed.proj.weight") and data_torch.ndim == 5: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you sure about this? seems like bad copy-paste code from QwenVL
This PR adds initial support for the Eagle2-VL multimodal models (1B / 2B) in the MTMD pipeline.
The update introduces a dedicated converter path and runtime builder for the Eagle2-VL vision tower and its 2-layer projector.
All changes are fully self-contained and do not affect any existing model architectures.
Converter (convert_hf_to_gguf.py)
Eagle2VLVisionModel.VisionProjectorType=EAGLE2VLinto GGUF metadata.spatial_merge_size, default: 2×2).mm.0,mm.2) to[n_in, n_out]; supports optional biases.GGUF (gguf-py/gguf/constants.py)
EAGLE2VL.Runtime (tools/mtmd/clip.cpp)
build_eagle2vl()vision path:mm.0 → GELU → mm.2) using canonical[n_in, n_out]weights.PROJECTOR_TYPE_EAGLE2VLto the new builder.mm_2_w->ne[1].Integration & Compatibility
Validation
Tested locally on Eagle2-VL 1B and 2B checkpoints:
Scope
This PR focuses on Eagle2-VL (1B / 2B).
Support for additional Eagle2 variants (e.g., 9B) will be handled in a follow-up.
Closes #16704