model: LFM2-VL fixes #17577

tdakhran · 2025-11-28T16:15:37Z

Debugging of #17290 revealed multiple issues with LFM2-VL.

This PR fixes the following issues and makes the output of llama.cpp equivalent to PyTorch

surround image embeddings with <|image_start|> and <|image_end|> tokens
use round_by_factor to calculate target width and height in "smart resize".
stretch the image to the width and height calculated by smart resize instead of padding
place image embeddings before user prompt
resize positional embedding with antialiasing enabled

The central issue was the resizing of positional embeddings. Siglip2 implementation in PyTorch uses F.interpolate(..., mode="bilinear", align_corners=False, antialias=True). Antialiasing only contributes during downscaling. When the image width or height is less than 256, the scaling of positional embeddings in llama.cpp produced numerically different results from PyTorch.

A new flag, ' GGML_SCALE_FLAG_ANTIALIAS', has been added for the upscale function, with implementations for CPU and CUDA.
Now outputs match:

PyTorch (fp32)

For the vision tower, LF2-M2-VL uses Sigilip2 NaFlex encoders to convert input images into token sequences. Two variants are implemented:

this PR (bin/llama-mtmd-cli -m $CKPT/LFM2-VL-1.6B-F32.gguf --mmproj $CKPT/mmproj-LFM2-VL-1.6B-F32.gguf -n 64 -t 4 --image /data/playground/issue_17290/siglip_1024.png -p "OCR." --temp 0.0 --top-k 1)

For the vision tower, LF2-M2-VL uses Sigilip2 NaFlex encoders to convert input images into token sequences. Two variants are implemented:

ggml/src/ggml-cpu/ops.cpp

ggml/src/ggml-cuda/upscale.cu

tools/mtmd/clip.cpp

tools/mtmd/mtmd.h

ngxson

I think we probably need to update ggml_backend_*_supports_op across backend, to avoid some backend fallback to the non-antialias kernel, which will result in wrong results.

For backend that does not support this mode, it will be fallback to CPU

SmartestWashingMachine · 2025-11-28T22:00:54Z

On my end (CPU) the outputs of fp32 and bf16 450M looked good, tested on a variety of small images (< 16/32px one side).

Also checked a few personal tuned 1.6B Q3s (which should be more sensitive) and the outputs were great - it didn't go into a repetitive "breaking" state like before!

It couldn't have been easy to figure this issue out... Thank you guys for looking into this!

tdakhran · 2025-11-30T10:33:57Z

@SmartestWashingMachine, great to see it work for you!

@ngxson, I rolled back changes related to default marker placement, for now -p "<__media__>OCR." has to be explicitly specified for LFM2-VL for llama-mtmd-cli, otherwise incorrect output will be produced.

Added a fallback to CPU for backends as well.

tdakhran requested review from ggerganov and ngxson as code owners November 28, 2025 16:15

tdakhran commented Nov 28, 2025

View reviewed changes

tdakhran mentioned this pull request Nov 28, 2025

Eval bug: LFM2-VL giving different outputs on large images. #17290

Open

ngxson reviewed Nov 28, 2025

View reviewed changes

tools/mtmd/mtmd.h Outdated Show resolved Hide resolved

loci-dev mentioned this pull request Nov 28, 2025

UPSTREAM PR #17577: model: LFM2-VL fixes auroralabs-loci/llama.cpp#350

Open

github-actions bot added testing Everything test related Nvidia GPU Issues specific to Nvidia GPUs examples ggml changes relating to the ggml tensor library for machine learning labels Nov 28, 2025

ngxson reviewed Nov 28, 2025

View reviewed changes

tdakhran force-pushed the tarek/feat/upstream_17290 branch from 50ba22e to 2385ecf Compare November 30, 2025 10:27

tdakhran requested review from 0cc4m, lhez and max-krasnyansky as code owners November 30, 2025 10:27

github-actions bot added Vulkan Issues specific to the Vulkan backend SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language Apple Metal https://en.wikipedia.org/wiki/Metal_(API) Ascend NPU issues specific to Ascend NPUs OpenCL Issues specific to the OpenCL backend labels Nov 30, 2025

tdakhran added 9 commits November 30, 2025 11:27

Adjust to pytorch

2386891

Add antialiasing upscale

c509073

Increase number of patches to 1024

80b4e97

Handle default marker insertion for LFM2

1cd4e2f

Switch to flag

40e08b8

Reformat

65789e5

Cuda implementation of antialias kernel

7cf67d6

Change placement in ops.cpp

7c8b098

consistent float literals

3ea706e

tdakhran added 4 commits November 30, 2025 11:27

Pad only for LFM2

0b14906

Address PR feedback

b81928f

Rollback default marker placement changes

31be1a9

Fallback to CPU implementation for antialias implementation of upscale

2385ecf

tdakhran mentioned this pull request Nov 30, 2025

model : Fix marker placement for LFM2-VL in single turn llama-mtmd-cli #17616

Open

tdakhran requested a review from ngxson November 30, 2025 13:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

model: LFM2-VL fixes #17577

model: LFM2-VL fixes #17577

tdakhran commented Nov 28, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ngxson left a comment

Uh oh!

SmartestWashingMachine commented Nov 28, 2025 •

edited

Loading

Uh oh!

tdakhran commented Nov 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

model: LFM2-VL fixes #17577

Are you sure you want to change the base?

model: LFM2-VL fixes #17577

Conversation

tdakhran commented Nov 28, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ngxson left a comment

Choose a reason for hiding this comment

Uh oh!

SmartestWashingMachine commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tdakhran commented Nov 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SmartestWashingMachine commented Nov 28, 2025 •

edited

Loading