You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+29-3Lines changed: 29 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@ The API is built with FastAPI and uses Celery for asynchronous task processing.
8
8
9
9
## Features:
10
10
-**No Cloud/external dependencies** all you need: PyTorch based OCR (EasyOCR) + Ollama are shipped and configured via `docker-compose` no data is sent outside your dev/server environment,
11
-
-**PDF/Office to Markdown** conversion with very high accuracy using different OCR strategies including [llama3.2-vision](https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/), [easyOCR](https://github.com/JaidedAI/EasyOCR)
11
+
-**PDF/Office to Markdown** conversion with very high accuracy using different OCR strategies including [llama3.2-vision](https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/), [easyOCR](https://github.com/JaidedAI/EasyOCR), [minicpm-v](https://github.com/OpenBMB/MiniCPM-o?tab=readme-ov-file#minicpm-v-26)
12
12
-**PDF/Office to JSON** conversion using Ollama supported models (eg. LLama 3.1)
13
13
-**LLM Improving OCR results** LLama is pretty good with fixing spelling and text issues in the OCR text
14
14
-**Removing PII** This tool can be used for removing Personally Identifiable Information out of document - see `examples`
In case of any questions, help requests or just feedback - please [join us on Discord](https://discord.gg/NJzu47Ye3a)!
164
164
165
+
166
+
## Text extract stratgies
167
+
168
+
### `easyocr`
169
+
170
+
Easy OCR is avaialble on Apache based license. It's general purpose OCR with support for more than 30 langues, probably with the best performance for English.
171
+
172
+
Enabled by default. Please do use the `strategy=easyocr` CLI and URL parameters to use it.
173
+
174
+
175
+
### `minicpm-v`
176
+
177
+
MiniCPM-V is Apache based licenseed OCR strategy.
178
+
179
+
The usage of MiniCPM-o/V model weights must strictly follow [MiniCPM Model License.md](https://github.com/OpenBMB/MiniCPM/blob/main/MiniCPM%20Model%20License.md).
180
+
181
+
The models and weights of MiniCPM are completely free for academic research. after filling out a ["questionnaire"](https://modelbest.feishu.cn/share/base/form/shrcnpV5ZT9EJ6xYjh3Kx0J6v8g) for registration, are also available for free commercial use.
182
+
183
+
Enabled by default. Please do use the `strategy=minicpm_v` CLI and URL parameters to use it.
184
+
185
+
### `llama_vision`
186
+
187
+
LLama 3.2 Vision Strategy is licensed on [Meta Community License Agreement](https://ollama.com/library/llama3.2-vision/blobs/0b4284c1f870). Works great for many languages, although due to the number of parameters (90b) this model is probably **the slowest** one.
188
+
189
+
Enabled by default. Please do use the `strategy=llama_vision` CLI and URL parameters to use it. It's by the way the default strategy
0 commit comments