mtmd: Add DeepSeekOCR Support #17400

sfallah · 2025-11-20T09:11:15Z

Feature Request: #16676

Make sure to read the contributing guidelines before submitting a PR

GGUF Models

sabafallah/DeepSeek-OCR-GGUF

deepseek-ocr-f32.gguf

mmproj-deepseek-ocr-f32.gguf

Running the Model

Build llama.cpp (Mac)

cmake -B build -G Ninja -DCMAKE_BUILD_TYPE=Release
cmake --build build -j --config Release

Running llama-mtmd-cli

build/bin/llama-mtmd-cli \
-m gguf_models/deepseek-ai/deepseek-ocr-f32.gguf \
--mmproj gguf_models/deepseek-ai/mmproj-deepseek-ocr-f32.gguf \
--image tmp/mtmd_test_data/Deepseek-OCR-2510.18234v1_page1.png \
-p "<|grounding|>Convert the document to markdown." \
--chat-template deepseek \

init commit

mtmd: fix vision model processing

…f/deepseek-ocr

testing Vision model loading

mtmd: DeepseekOCR Implement DeepSeek3B-MoE-A570M (LM component)

…ut in deepseek2 model

…f/deepseek-ocr

debug: correct token order

Add native resolution support

- changes are concerning PR #4

mtmd: quick fix token order

# Conflicts: # convert_hf_to_gguf.py # src/llama-model.cpp # src/models/deepseek2.cpp

…f/deepseek-ocr

…rol & all native resolution modes work

First DeepSeek-OCR working implementation

# Conflicts: # convert_hf_to_gguf.py # tools/mtmd/clip.h # tools/mtmd/mtmd.cpp

ngxson · 2025-12-02T22:31:57Z

common/arg.cpp

+        "- auto (default): automatically select resolution\n"
+        "- tiny, small, base, large: native resolution\n"
+        "- gundam, gundam-master: dynamic resolution",


IMO these modes can look quite confusing for end-users.

I already seen your logic where you calculate the area to automatically determine the best resolution, it looks good enough.

So, I think we can better remove the argument and make everything automatic.

ok. I'll remove it later.

…f/deepseek-ocr

bluebread · 2025-12-03T08:16:11Z

tools/mtmd/clip.cpp

+                    res_imgs->grid_y = 1;
+                }
+                else {
+                    GGML_ABORT("DeepSeek-OCR: Gundam/Gundam-Master haven't been tested yet.\n");


@ngxson I've encountered an issue with batching images. In order to handle images much larger than 1280x1280, DeepSeek-OCR crops them into 640x640 (gundam) or 1024x1024 (gundam master) subimages as local views. However, the current framework doesn't support batching multiple images. Technically, it shouldn't be too difficult to add batch support, but I'm concerned about introducing new bugs and affecting other models. Do you have any suggestions?

IIRC it should be the same logic as llava uhd or minicpm-v where image is cropped into smaller sub-images

batching is not yet supported, but does all sub-images need to be on the same batch?

otherwise, what we can do is to extend the clip_image_f32 to include a notion of "nz":

struct clip_image_f32 { int nx; int ny; int nz; // can be > 1 for deepseek ocr std::vector<float> buf; };

And memory layout corresponding to what you need on cgraph (to avoid another ggml_permute for example)

bluebread · 2025-12-03T16:51:44Z

@sfallah Could you please mark this PR as ready for review and update the llama-mtmd-cli command for testing DeepSeek-OCR (because I removed --dsocr-mode argument)? Also, I have run the CI locally, and it failed on "27 - test-thread-safety". I guess this failure should be unrelated to the changes made in this PR. Here is the log: ci.txt

sfallah and others added 22 commits November 14, 2025 12:40

mtmd: llama.cpp DeepSeekOCR support

43a130b

init commit

loading sam tensors

b6b9f02

mtmd: fix vision model processing

85c7cda

Merge pull request #1 from bluebread/sf/deepseek-ocr

578c8d7

mtmd: fix vision model processing

deepseek-ocr clip-vit model impl

2aab52e

mtmd: add DeepSeek-OCR LM support with standard attention

eab28ed

mtmd: successfully runs DeepSeek-OCR LM in llama-cli

7630587

mtmd: Fix RoPE type for DeepSeek-OCR LM.

2de3436

Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into s…

e8b2610

…f/deepseek-ocr

loading LM

97e0907

testing Vision model loading

Merge branch 'sf/deepseek-ocr' into sf/deepseek-ocr

13dc6fb

Merge pull request #2 from bluebread/sf/deepseek-ocr

b32bb5e

mtmd: DeepseekOCR Implement DeepSeek3B-MoE-A570M (LM component)

sam warmup working

790bbb9

sam erroneous return corrected

cec9a5c

clip-vit: corrected cls_embd concat

8b3d319

clip-vit: model convert qkv_proj split

1e08157

corrected combining of image encoders' results

331cea8

fix: update callback for ffn_moe_weighted and add callback for attn_o…

6c0715b

…ut in deepseek2 model

Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into s…

a65ddf5

…f/deepseek-ocr

concat image_newline and image_seperator tokens

63a042f

visual_model warmup (technically) works

89afda8

window partitioning using standard ggml ops

88032f4

sfallah requested review from CISC, ggerganov and ngxson as code owners November 20, 2025 09:11

github-actions bot added model Model specific examples python python script changes labels Nov 20, 2025

sfallah marked this pull request as draft November 20, 2025 09:12

sfallah mentioned this pull request Nov 20, 2025

ggml : enhance rel-pos and window ops with CUDA support #17383

Open

sfallah and others added 11 commits November 23, 2025 12:10

Merge pull request #5 from bluebread/dsocr-debug

a594990

debug: correct token order

Merge branch 'sf/deepseek-ocr' into sf/deepseek-ocr

6dfda99

Merge pull request #4 from bluebread/sf/deepseek-ocr

7941f5d

Add native resolution support

- dynamic resizing

206f8ab

- changes are concerning PR #4

mtmd: quick fix token order

40e7e6e

mtmd: fix danling pointer

81533e4

Merge pull request #6 from bluebread/sf/deepseek-ocr

8810940

mtmd: quick fix token order

mtmd: SAM numerically works

a488b49

mtmd: debug CLIP-L (vit_pre_ln)

ccb2f23

mtmd: debug CLIP-L & first working DeepSeek-OCR model

841a4a8

Merge remote-tracking branch 'sfallah/master' into sf/deepseek-ocr

ed3b7f1

# Conflicts: # convert_hf_to_gguf.py # src/llama-model.cpp # src/models/deepseek2.cpp

loci-dev mentioned this pull request Nov 30, 2025

UPSTREAM PR #17400: mtmd: Add DeepSeekOCR Support auroralabs-loci/llama.cpp#372

Open

bluebread and others added 4 commits November 30, 2025 08:55

Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into s…

5543094

…f/deepseek-ocr

mtmd : add --dsocr-mode CLI argument for DeepSeek-OCR resolution cont…

c5f4c64

…rol & all native resolution modes work

mtmd: simplify SAM patch embedding

95239f9

Merge pull request #7 from bluebread/sf/deepseek-ocr

6b0e7cd

First DeepSeek-OCR working implementation

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Dec 2, 2025

zeerd mentioned this pull request Dec 2, 2025

加载deepseek-ocr@f16/DeepSeek-OCR.F16.gguf和deepseek-ocr@q4_0/DeepSeek-OCR.Q4_0.gguf报错 lmstudio-ai/lmstudio-bug-tracker#1265

Open

Merge branch 'master' into sf/deepseek-ocr

6634166

# Conflicts: # convert_hf_to_gguf.py # tools/mtmd/clip.h # tools/mtmd/mtmd.cpp

sfallah mentioned this pull request Dec 2, 2025

First DeepSeek-OCR working implementation sfallah/llama.cpp#7

Merged

ngxson reviewed Dec 2, 2025

View reviewed changes

bluebread added 3 commits December 3, 2025 05:18

mtmd: adapt Pillow image resizing function

c914e05

mtmd: simplify DeepSeek-OCR dynamic resolution preprocessing

e20857b

Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into s…

43dfc0c

…f/deepseek-ocr

bluebread reviewed Dec 3, 2025

View reviewed changes

bluebread added 2 commits December 3, 2025 14:54

mtmd: remove --dsocr-mode argument

b696c54

mtmd: refactor code & remove unused helper functions

b26b507

sfallah marked this pull request as ready for review December 3, 2025 16:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

mtmd: Add DeepSeekOCR Support #17400

mtmd: Add DeepSeekOCR Support #17400

sfallah commented Nov 20, 2025 •

edited

Loading

Uh oh!

ngxson Dec 2, 2025 •

edited

Loading

Uh oh!

bluebread Dec 3, 2025

Uh oh!

bluebread Dec 3, 2025 •

edited

Loading

Uh oh!

ngxson Dec 3, 2025 •

edited

Loading

Uh oh!

bluebread commented Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mtmd: Add DeepSeekOCR Support #17400

Are you sure you want to change the base?

mtmd: Add DeepSeekOCR Support #17400

Conversation

sfallah commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

GGUF Models

Running the Model

Build llama.cpp (Mac)

Running llama-mtmd-cli

Uh oh!

ngxson Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bluebread Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

bluebread Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ngxson Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bluebread commented Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sfallah commented Nov 20, 2025 •

edited

Loading

ngxson Dec 2, 2025 •

edited

Loading

bluebread Dec 3, 2025 •

edited

Loading

ngxson Dec 3, 2025 •

edited

Loading