You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+10-5Lines changed: 10 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@ The API is built with FastAPI and uses Celery for asynchronous task processing.
8
8
9
9
## Features:
10
10
-**No Cloud/external dependencies** all you need: PyTorch based OCR (Marker) + Ollama are shipped and configured via `docker-compose` no data is sent outside your dev/server environment,
11
-
-**PDF to Markdown** conversion with very high accuracy using different OCR strategies including [marker](https://github.com/VikParuchuri/marker), [surya-ocr](https://github.com/VikParuchuri/surya) or [tessereact](https://github.com/h/pytesseract)
11
+
-**PDF to Markdown** conversion with very high accuracy using different OCR strategies including [marker](https://github.com/VikParuchuri/marker) and [llama3.2-vision](https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/), [surya-ocr](https://github.com/VikParuchuri/surya) or [tessereact](https://github.com/h/pytesseract)
12
12
-**PDF to JSON** conversion using Ollama supported models (eg. LLama 3.1)
13
13
-**LLM Improving OCR results** LLama is pretty good with fixing spelling and text issues in the OCR text
14
14
-**Removing PII** This tool can be used for removing Personally Identifiable Information out of PDF - see `examples`
@@ -149,6 +149,7 @@ Then modify the variables inside the file:
149
149
#APP_ENV=production # sets the app into prod mode, othervise dev mode with auto-reload on code changes
150
150
REDIS_CACHE_URL=redis://localhost:6379/1
151
151
STORAGE_PROFILE_PATH=/storage_profiles
152
+
LLAMA_VISION_PROMPT="You are OCR. Convert image to markdown."
ocr_parser=subparsers.add_parser('ocr_upload', help='Upload a file to the OCR endpoint and get the result.')
163
166
ocr_parser.add_argument('--file', type=str, default='examples/rmi-example.pdf', help='Path to the file to upload')
164
167
ocr_parser.add_argument('--ocr_cache', default=True, action='store_true', help='Enable OCR result caching')
168
+
ocr_parser.add_argument('--disable_ocr_cache', default=True, action='store_true', help='Disable OCR result caching')
165
169
ocr_parser.add_argument('--prompt', type=str, default=None, help='Prompt used for the Ollama model to fix or transform the file')
166
170
ocr_parser.add_argument('--prompt_file', default=None, type=str, help='Prompt file name used for the Ollama model to fix or transform the file')
167
171
ocr_parser.add_argument('--model', type=str, default='llama3.1', help='Model to use for the Ollama endpoint')
168
-
ocr_parser.add_argument('--strategy', type=str, default='marker', help='OCR strategy to use for the file')
172
+
ocr_parser.add_argument('--strategy', type=str, default='llama_vision', help='OCR strategy to use for the file')
169
173
ocr_parser.add_argument('--print_progress', default=True, action='store_true', help='Print the progress of the OCR task')
170
174
ocr_parser.add_argument('--storage_profile', type=str, default='default', help='Storage profile to use for the file')
171
175
ocr_parser.add_argument('--storage_filename', type=str, default=None, help='Storage filename to use for the file. You may use some formatting - see the docs')
@@ -175,10 +179,11 @@ def main():
175
179
ocr_parser=subparsers.add_parser('ocr', help='Upload a file to the OCR endpoint and get the result.')
176
180
ocr_parser.add_argument('--file', type=str, default='examples/rmi-example.pdf', help='Path to the file to upload')
177
181
ocr_parser.add_argument('--ocr_cache', default=True, action='store_true', help='Enable OCR result caching')
182
+
ocr_parser.add_argument('--disable_ocr_cache', default=True, action='store_true', help='Disable OCR result caching')
178
183
ocr_parser.add_argument('--prompt', type=str, default=None, help='Prompt used for the Ollama model to fix or transform the file')
179
184
ocr_parser.add_argument('--prompt_file', default=None, type=str, help='Prompt file name used for the Ollama model to fix or transform the file')
180
185
ocr_parser.add_argument('--model', type=str, default='llama3.1', help='Model to use for the Ollama endpoint')
181
-
ocr_parser.add_argument('--strategy', type=str, default='marker', help='OCR strategy to use for the file')
186
+
ocr_parser.add_argument('--strategy', type=str, default='llama_vision', help='OCR strategy to use for the file')
182
187
ocr_parser.add_argument('--print_progress', default=True, action='store_true', help='Print the progress of the OCR task')
183
188
ocr_parser.add_argument('--storage_profile', type=str, default='default', help='Storage profile to use for the file')
184
189
ocr_parser.add_argument('--storage_filename', type=str, default=None, help='Storage filename to use for the file. You may use some formatting - see the docs')
@@ -189,10 +194,11 @@ def main():
189
194
ocr_request_parser=subparsers.add_parser('ocr_request', help='Upload a file to the OCR endpoint via JSON and get the result.')
190
195
ocr_request_parser.add_argument('--file', type=str, default='examples/rmi-example.pdf', help='Path to the file to upload')
191
196
ocr_request_parser.add_argument('--ocr_cache', default=True, action='store_true', help='Enable OCR result caching')
197
+
ocr_request_parser.add_argument('--disable_ocr_cache', default=True, action='store_true', help='Disable OCR result caching')
192
198
ocr_request_parser.add_argument('--prompt', type=str, default=None, help='Prompt used for the Ollama model to fix or transform the file')
193
199
ocr_request_parser.add_argument('--prompt_file', default=None, type=str, help='Prompt file name used for the Ollama model to fix or transform the file')
194
200
ocr_request_parser.add_argument('--model', type=str, default='llama3.1', help='Model to use for the Ollama endpoint')
195
-
ocr_request_parser.add_argument('--strategy', type=str, default='marker', help='OCR strategy to use')
201
+
ocr_request_parser.add_argument('--strategy', type=str, default='llama_vision', help='OCR strategy to use')
196
202
ocr_request_parser.add_argument('--print_progress', default=True, action='store_true', help='Print the progress of the OCR task')
197
203
ocr_request_parser.add_argument('--storage_profile', type=str, default='default', help='Storage profile to use. You may use some formatting - see the docs')
198
204
ocr_request_parser.add_argument('--storage_filename', type=str, default=None, help='Storage filename to use')
0 commit comments