-
Notifications
You must be signed in to change notification settings - Fork 13.9k
model: LFM2-VL fixes #17577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
model: LFM2-VL fixes #17577
Conversation
ngxson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we probably need to update ggml_backend_*_supports_op across backend, to avoid some backend fallback to the non-antialias kernel, which will result in wrong results.
For backend that does not support this mode, it will be fallback to CPU
|
On my end (CPU) the outputs of fp32 and bf16 450M looked good, tested on a variety of small images (< 16/32px one side). Also checked a few personal tuned 1.6B Q3s (which should be more sensitive) and the outputs were great - it didn't go into a repetitive "breaking" state like before! It couldn't have been easy to figure this issue out... Thank you guys for looking into this! |
50ba22e to
2385ecf
Compare
|
@SmartestWashingMachine, great to see it work for you! @ngxson, I rolled back changes related to default marker placement, for now Added a fallback to CPU for backends as well. |
Debugging of #17290 revealed multiple issues with LFM2-VL.
This PR fixes the following issues and makes the output of
llama.cppequivalent to PyTorchround_by_factorto calculate targetwidthandheightin "smart resize".The central issue was the resizing of positional embeddings. Siglip2 implementation in PyTorch uses
F.interpolate(..., mode="bilinear", align_corners=False, antialias=True). Antialiasing only contributes during downscaling. When the image width or height is less than 256, the scaling of positional embeddings inllama.cppproduced numerically different results from PyTorch.A new flag, ' GGML_SCALE_FLAG_ANTIALIAS', has been added for the upscale function, with implementations for CPU and CUDA.

Now outputs match:
PyTorch (fp32)
this PR (
bin/llama-mtmd-cli -m $CKPT/LFM2-VL-1.6B-F32.gguf --mmproj $CKPT/mmproj-LFM2-VL-1.6B-F32.gguf -n 64 -t 4 --image /data/playground/issue_17290/siglip_1024.png -p "OCR." --temp 0.0 --top-k 1)