-
Notifications
You must be signed in to change notification settings - Fork 13.9k
mtmd: Add DeepSeekOCR Support #17400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
sfallah
wants to merge
57
commits into
ggml-org:master
Choose a base branch
from
sfallah:sf/deepseek-ocr
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+1,548
−48
Open
Changes from 51 commits
Commits
Show all changes
57 commits
Select commit
Hold shift + click to select a range
43a130b
mtmd: llama.cpp DeepSeekOCR support
sfallah b6b9f02
loading sam tensors
sfallah 85c7cda
mtmd: fix vision model processing
bluebread 578c8d7
Merge pull request #1 from bluebread/sf/deepseek-ocr
sfallah 2aab52e
deepseek-ocr clip-vit model impl
sfallah eab28ed
mtmd: add DeepSeek-OCR LM support with standard attention
bluebread 7630587
mtmd: successfully runs DeepSeek-OCR LM in llama-cli
bluebread 2de3436
mtmd: Fix RoPE type for DeepSeek-OCR LM.
bluebread e8b2610
Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into s…
bluebread 97e0907
loading LM
sfallah 13dc6fb
Merge branch 'sf/deepseek-ocr' into sf/deepseek-ocr
sfallah b32bb5e
Merge pull request #2 from bluebread/sf/deepseek-ocr
sfallah 790bbb9
sam warmup working
sfallah cec9a5c
sam erroneous return corrected
sfallah 8b3d319
clip-vit: corrected cls_embd concat
sfallah 1e08157
clip-vit: model convert qkv_proj split
sfallah 331cea8
corrected combining of image encoders' results
sfallah 6c0715b
fix: update callback for ffn_moe_weighted and add callback for attn_o…
bluebread a65ddf5
Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into s…
bluebread 63a042f
concat image_newline and image_seperator tokens
sfallah 89afda8
visual_model warmup (technically) works
sfallah 88032f4
window partitioning using standard ggml ops
sfallah 1268dc3
Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into s…
bluebread 68b206b
sam implementation without using CPU only ops
sfallah 8bce66d
clip: fixed warnings
bluebread 5e6cf3c
Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into s…
bluebread 7e9fbec
mtmd: fix get_rel_pos
bluebread 0f5587d
Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into s…
bluebread 7b8d735
mtmd: fixed the wrong scaler for get_rel_pos
bluebread 86f111f
image encoding technically works but the output can't be checked sing…
sfallah effe669
mtmd: minor changed
bluebread f8f66a1
Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into s…
bluebread 3fcfc3a
Merge pull request #3 from bluebread/sf/deepseek-ocr
sfallah ee8a148
mtmd: add native resolution support
bluebread 4cfa15f
- image encoding debugged
sfallah 3f71188
mtmd: correct token order
bluebread a594990
Merge pull request #5 from bluebread/dsocr-debug
sfallah 6dfda99
Merge branch 'sf/deepseek-ocr' into sf/deepseek-ocr
sfallah 7941f5d
Merge pull request #4 from bluebread/sf/deepseek-ocr
sfallah 206f8ab
- dynamic resizing
sfallah 40e7e6e
mtmd: quick fix token order
bluebread 81533e4
mtmd: fix danling pointer
bluebread 8810940
Merge pull request #6 from bluebread/sf/deepseek-ocr
sfallah a488b49
mtmd: SAM numerically works
bluebread ccb2f23
mtmd: debug CLIP-L (vit_pre_ln)
bluebread 841a4a8
mtmd: debug CLIP-L & first working DeepSeek-OCR model
bluebread ed3b7f1
Merge remote-tracking branch 'sfallah/master' into sf/deepseek-ocr
sfallah 5543094
Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into s…
bluebread c5f4c64
mtmd : add --dsocr-mode CLI argument for DeepSeek-OCR resolution cont…
bluebread 95239f9
mtmd: simplify SAM patch embedding
bluebread 6b0e7cd
Merge pull request #7 from bluebread/sf/deepseek-ocr
sfallah 6634166
Merge branch 'master' into sf/deepseek-ocr
sfallah c914e05
mtmd: adapt Pillow image resizing function
bluebread e20857b
mtmd: simplify DeepSeek-OCR dynamic resolution preprocessing
bluebread 43dfc0c
Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into s…
bluebread b696c54
mtmd: remove --dsocr-mode argument
bluebread b26b507
mtmd: refactor code & remove unused helper functions
bluebread File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO these modes can look quite confusing for end-users.
I already seen your logic where you calculate the area to automatically determine the best resolution, it looks good enough.
So, I think we can better remove the argument and make everything automatic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. I'll remove it later.