Skip to content

Commit 5adb8b3

Browse files
committed
fix: readme installation manual for marker
1 parent 40757b6 commit 5adb8b3

File tree

1 file changed

+31
-0
lines changed

1 file changed

+31
-0
lines changed

README.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -188,6 +188,37 @@ LLama 3.2 Vision Strategy is licensed on [Meta Community License Agreement](http
188188
189189
Enabled by default. Please do use the `strategy=llama_vision` CLI and URL parameters to use it. It's by the way the default strategy
190190
191+
192+
### `marker`
193+
194+
[Marker, state of the art PDF OCR](https://github.com/VikParuchuri/marker) - works really great for more than 50 languages, including great accuracy for Polish and other languages - let's say that are "diffult" to read for standard OCR.
195+
196+
The `marker-pdf` is however licensed on GPL3 license and **therefore it's not included** by default in this application (as we're bound to MIT).
197+
198+
The weights for the models are licensed cc-by-nc-sa-4.0, but I will waive that for any organization under $5M USD in gross revenue in the most recent 12-month period AND under $5M in lifetime VC/angel funding raised. You also must not be competitive with the Datalab API. If you want to remove the GPL license requirements (dual-license) and/or use the weights commercially over the revenue limit, check out the options here.
199+
200+
To have it up and running please execute the following steps:
201+
202+
```bash
203+
pip install marker-pdf
204+
pip install -U uvicorn fastapi python-multipart
205+
marker_server --port 8002
206+
```
207+
208+
**Note: *** you might run `marker_server` on different port - then just make sure you export a proper env setting beffore starting off `text-extract-api` server:
209+
210+
```bash
211+
export MARKER_API_URL=http://localhost:8002/marker/upload
212+
```
213+
214+
Please do use the `strategy=marker` CLI and URL parameters to use it. For example:
215+
216+
```bash
217+
curl -X POST -H "Content-Type: multipart/form-data" -F "file=@examples/example-mri.pdf" -F "strategy=marker" -F "ocr_cache=true" -F "prompt=" -F "model=" "http://localhost:8000/ocr/upload"
218+
```
219+
220+
We are connecting to marker via it's API to not share the same license (GPL3) by having it all linked on the source code level.
221+
191222
## Getting started with Docker
192223
193224
### Prerequisites

0 commit comments

Comments
 (0)