You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+47-4Lines changed: 47 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,8 +7,8 @@ The API is built with FastAPI and uses Celery for asynchronous task processing.
7
7

8
8
9
9
## Features:
10
-
-**No Cloud/external dependencies** all you need: PyTorch based OCR (EasyOCR) + Ollama are shipped and configured via `docker-compose`. No data is sent outside your dev/server environment.
11
-
-**PDF/Office to Markdown** conversion with very high accuracy using different OCR strategies including [llama3.2-vision](https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/), [easyOCR](https://github.com/JaidedAI/EasyOCR), [minicpm-v](https://github.com/OpenBMB/MiniCPM-o?tab=readme-ov-file#minicpm-v-26)
10
+
-**No Cloud/external dependencies** all you need: PyTorch based OCR (EasyOCR) + Ollama are shipped and configured via `docker-compose` no data is sent outside your dev/server environment,
11
+
-**PDF/Office to Markdown** conversion with very high accuracy using different OCR strategies including [llama3.2-vision](https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/), [easyOCR](https://github.com/JaidedAI/EasyOCR), [minicpm-v](https://github.com/OpenBMB/MiniCPM-o?tab=readme-ov-file#minicpm-v-26), remote URL strategies including [marker-pdf](https://github.com/VikParuchuri/marker)
12
12
-**PDF/Office to JSON** conversion using Ollama supported models (eg. LLama 3.1)
13
13
-**LLM Improving OCR results** LLama is pretty good with fixing spelling and text issues in the OCR text
14
14
-**Removing PII** This tool can be used for removing Personally Identifiable Information out of document - see `examples`
@@ -196,6 +196,49 @@ LLama 3.2 Vision Strategy is licensed on [Meta Community License Agreement](http
196
196
197
197
Enabled by default. Please do use the `strategy=llama_vision` CLI and URL parameters to use it. It's by the way the default strategy
198
198
199
+
200
+
### `remote`
201
+
202
+
Some OCR's - like [Marker, state of the art PDF OCR](https://github.com/VikParuchuri/marker) - works really great for more than 50 languages, including great accuracy for Polish and other languages - let's say that are "diffult" to read for standard OCR.
203
+
204
+
The `marker-pdf` is however licensed on GPL3 license and **therefore it's not included** by default in this application (as we're bound to MIT).
205
+
206
+
The weights for the models are licensed cc-by-nc-sa-4.0, but I will waive that for any organization under $5M USD in gross revenue in the most recent 12-month period AND under $5M in lifetime VC/angel funding raised. You also must not be competitive with the Datalab API. If you want to remove the GPL license requirements (dual-license) and/or use the weights commercially over the revenue limit, check out the options here.
207
+
208
+
To have it up and running you can execute the following steps:
209
+
210
+
```bash
211
+
mkdir marker-distribution # this should be outside of the `text-extract-api` folder!
212
+
cd marker-distribution
213
+
pip install marker-pdf
214
+
pip install -U uvicorn fastapi python-multipart
215
+
marker_server --port 8002
216
+
```
217
+
218
+
Set the Remote API Url:
219
+
220
+
**Note: *** you might run `marker_server` on different port or server - then just make sure you export a proper env setting beffore starting off `text-extract-api` server:
0 commit comments