Skip to content

Commit 70abc69

Browse files
committed
[fix] env variables fixed
1 parent 81ab2f6 commit 70abc69

File tree

10 files changed

+55
-14
lines changed

10 files changed

+55
-14
lines changed

.env.example

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
REDIS_CACHE_URL=redis://redis:6379/1
33
OLLAMA_HOST=http://ollama:11434
44
STORAGE_PROFILE_PATH=/storage_profiles
5+
LLAMA_VISION_PROMPT="You are OCR. Convert image to markdown."
56

67
# CLI settings
78
OCR_URL=http://localhost:8000/ocr/upload
@@ -13,4 +14,4 @@ LIST_FILES_URL=http://localhost:8000/storage/list
1314
LOAD_FILE_URL=http://localhost:8000/storage/load
1415
DELETE_FILE_URL=http://localhost:8000/storage/delete
1516
OCR_REQUEST_URL=http://localhost:8000/ocr/request
16-
OCR_UPLOAD_URL=http://localhost:8000/ocr/upload
17+
OCR_UPLOAD_URL=http://localhost:8000/ocr/upload

.env.localhost.example

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
#APP_ENV=production # sets the app into prod mode, othervise dev mode with auto-reload on code changes
22
REDIS_CACHE_URL=redis://localhost:6379/1
3+
LLAMA_VISION_PROMPT="You are OCR. Convert image to markdown."
34

45
# CLI settings
56
OCR_URL=http://localhost:8000/ocr/upload

README.md

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ The API is built with FastAPI and uses Celery for asynchronous task processing.
88

99
## Features:
1010
- **No Cloud/external dependencies** all you need: PyTorch based OCR (Marker) + Ollama are shipped and configured via `docker-compose` no data is sent outside your dev/server environment,
11-
- **PDF to Markdown** conversion with very high accuracy using different OCR strategies including [marker](https://github.com/VikParuchuri/marker), [surya-ocr](https://github.com/VikParuchuri/surya) or [tessereact](https://github.com/h/pytesseract)
11+
- **PDF to Markdown** conversion with very high accuracy using different OCR strategies including [marker](https://github.com/VikParuchuri/marker) and [llama3.2-vision](https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/), [surya-ocr](https://github.com/VikParuchuri/surya) or [tessereact](https://github.com/h/pytesseract)
1212
- **PDF to JSON** conversion using Ollama supported models (eg. LLama 3.1)
1313
- **LLM Improving OCR results** LLama is pretty good with fixing spelling and text issues in the OCR text
1414
- **Removing PII** This tool can be used for removing Personally Identifiable Information out of PDF - see `examples`
@@ -149,6 +149,7 @@ Then modify the variables inside the file:
149149
#APP_ENV=production # sets the app into prod mode, othervise dev mode with auto-reload on code changes
150150
REDIS_CACHE_URL=redis://localhost:6379/1
151151
STORAGE_PROFILE_PATH=/storage_profiles
152+
LLAMA_VISION_PROMPT="You are OCR. Convert image to markdown."
152153

153154
# CLI settings
154155
OCR_URL=http://localhost:8000/ocr/upload
@@ -215,14 +216,17 @@ pip install -r requirements.txt
215216
```
216217

217218

218-
### Pull the LLama3.1 model
219+
### Pull the LLama3.1 and LLama3.2-vision models
219220

220221
You might want to test out [different models supported by LLama](https://ollama.com/library)
221222

222223
```bash
223224
python client/cli.py llm_pull --model llama3.1
225+
python client/cli.py llm_pull --model llama3.2-vision
224226
```
225227

228+
These models are required for most features supported by `pdf-extract-api`.
229+
226230

227231
### Upload a File for OCR (converting to Markdown)
228232

@@ -247,6 +251,7 @@ For example you must run:
247251

248252
```bash
249253
python client/cli.py llm_pull --model llama3.1
254+
python client/cli.py llm_pull --model llama3.2-vision
250255
```
251256

252257
and only after to run this specific prompt query:
@@ -334,7 +339,7 @@ const apiClient = new ApiClient('https://api.doctractor.com/', 'doctractor', 'Ae
334339
const formData = new FormData();
335340
formData.append('file', fileInput.files[0]);
336341
formData.append('prompt', 'Convert file to JSON and return only JSON'); // if not provided, no LLM transformation will gonna happen - just the OCR
337-
formData.append('strategy', 'marker');
342+
formData.append('strategy', 'llama_vision');
338343
formData.append('model', 'llama3.1')
339344
formData.append('ocr_cache', 'true');
340345

@@ -350,7 +355,7 @@ apiClient.uploadFile(formData).then(response => {
350355
- **Method**: POST
351356
- **Parameters**:
352357
- **file**: PDF file to be processed.
353-
- **strategy**: OCR strategy to use (`marker` or `tesseract`).
358+
- **strategy**: OCR strategy to use (`marker`, `llama_vision` or `tesseract`).
354359
- **ocr_cache**: Whether to cache the OCR result (true or false).
355360
- **prompt**: When provided, will be used for Ollama processing the OCR result
356361
- **model**: When provided along with the prompt - this model will be used for LLM processing
@@ -368,7 +373,7 @@ curl -X POST -H "Content-Type: multipart/form-data" -F "file=@examples/example-m
368373
- **Method**: POST
369374
- **Parameters** (JSON body):
370375
- **file**: Base64 encoded PDF file content.
371-
- **strategy**: OCR strategy to use (`marker` or `tesseract`).
376+
- **strategy**: OCR strategy to use (`marker`, `llama_vision` or `tesseract`).
372377
- **ocr_cache**: Whether to cache the OCR result (true or false).
373378
- **prompt**: When provided, will be used for Ollama processing the OCR result.
374379
- **model**: When provided along with the prompt - this model will be used for LLM processing.

app/ocr_strategies/llama_vision.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
import ollama
44
import io
55
import os
6+
import time
67
from pdf2image import convert_from_bytes
78

89
class LlamaVisionOCRStrategy(OCRStrategy):
@@ -12,7 +13,9 @@ def extract_text_from_pdf(self, pdf_bytes):
1213
# Convert PDF bytes to images
1314
images = convert_from_bytes(pdf_bytes)
1415
extracted_text = ""
15-
16+
start_time = time.time()
17+
ocr_percent_done = 0
18+
num_pages = len(images)
1619
for i, image in enumerate(images):
1720
# Convert image to base64
1821
buffered = io.BytesIO()
@@ -25,9 +28,13 @@ def extract_text_from_pdf(self, pdf_bytes):
2528
'content': os.getenv('LLAMA_VISION_PROMPT', "You are OCR. Convert image to markdown."),
2629
'images': [img_str]
2730
}], stream=True)
31+
num_chunk = 1
2832
for chunk in response:
33+
self.update_state_callback(state='PROGRESS', meta={'progress': str(30 + ocr_percent_done), 'status': 'OCR Processing (page ' + str(i+1) + ' of ' + str(num_pages) +') chunk no: ' + str(num_chunk), 'start_time': start_time, 'elapsed_time': time.time() - start_time}) # Example progress update
34+
num_chunk += 1
2935
extracted_text += chunk['message']['content']
3036

37+
ocr_percent_done += int(20/num_pages) #20% of work is for OCR - just a stupid assumption from tasks.py
3138
except ollama.ResponseError as e:
3239
print('Error:', e.error)
3340
raise Exception("Failed to generate text with Llama 3.2 Vision model")

app/ocr_strategies/ocr_strategy.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,16 @@
11
class OCRStrategy:
2+
3+
def __init__(self):
4+
print("a")
5+
self.update_state_callback = None
6+
7+
def set_update_state_callback(self, callback):
8+
self.update_state_callback = callback
9+
10+
def update_state(self, state, meta):
11+
if self.update_state_callback:
12+
self.update_state_callback(state, meta)
13+
214
"""Base OCR Strategy Interface"""
315
def extract_text_from_pdf(self, pdf_bytes):
416
raise NotImplementedError("Subclasses must implement this method")

app/tasks.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,8 @@ def ocr_task(self, pdf_bytes, strategy_name, pdf_filename, pdf_hash, ocr_cache,
2828
raise ValueError(f"Unknown strategy '{strategy_name}'. Available: marker, tesseract, llama_vision")
2929

3030
ocr_strategy = OCR_STRATEGIES[strategy_name]
31+
ocr_strategy.set_update_state_callback(self.update_state)
32+
3133
self.update_state(state='PROGRESS', status="File uploaded successfully", meta={'progress': 10}) # Example progress update
3234

3335
extracted_text = None

client/cli.py

Lines changed: 13 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,12 @@
44
import time
55
import os
66

7-
def ocr_upload(file_path, ocr_cache, prompt, prompt_file=None, model='llama3.1', strategy='marker', storage_profile='default', storage_filename=None):
7+
def ocr_upload(file_path, ocr_cache, prompt, prompt_file=None, model='llama3.1', strategy='llama_vision', storage_profile='default', storage_filename=None):
88
ocr_url = os.getenv('OCR_UPLOAD_URL', 'http://localhost:8000/ocr/upload')
99
files = {'file': open(file_path, 'rb')}
10+
if not ocr_cache:
11+
print("OCR cache disabled.")
12+
1013
data = {'ocr_cache': ocr_cache, 'model': model, 'strategy': strategy, 'storage_profile': storage_profile}
1114

1215
if storage_filename:
@@ -37,7 +40,7 @@ def ocr_upload(file_path, ocr_cache, prompt, prompt_file=None, model='llama3.1',
3740
print(f"Failed to upload file: {response.text}")
3841
return None
3942

40-
def ocr_request(file_path, ocr_cache, prompt, prompt_file=None, model='llama3.1', strategy='marker', storage_profile='default', storage_filename=None):
43+
def ocr_request(file_path, ocr_cache, prompt, prompt_file=None, model='llama3.1', strategy='llama_vision', storage_profile='default', storage_filename=None):
4144
ocr_url = os.getenv('OCR_REQUEST_URL', 'http://localhost:8000/ocr/request')
4245
with open(file_path, 'rb') as f:
4346
file_content = base64.b64encode(f.read()).decode('utf-8')
@@ -162,10 +165,11 @@ def main():
162165
ocr_parser = subparsers.add_parser('ocr_upload', help='Upload a file to the OCR endpoint and get the result.')
163166
ocr_parser.add_argument('--file', type=str, default='examples/rmi-example.pdf', help='Path to the file to upload')
164167
ocr_parser.add_argument('--ocr_cache', default=True, action='store_true', help='Enable OCR result caching')
168+
ocr_parser.add_argument('--disable_ocr_cache', default=True, action='store_true', help='Disable OCR result caching')
165169
ocr_parser.add_argument('--prompt', type=str, default=None, help='Prompt used for the Ollama model to fix or transform the file')
166170
ocr_parser.add_argument('--prompt_file', default=None, type=str, help='Prompt file name used for the Ollama model to fix or transform the file')
167171
ocr_parser.add_argument('--model', type=str, default='llama3.1', help='Model to use for the Ollama endpoint')
168-
ocr_parser.add_argument('--strategy', type=str, default='marker', help='OCR strategy to use for the file')
172+
ocr_parser.add_argument('--strategy', type=str, default='llama_vision', help='OCR strategy to use for the file')
169173
ocr_parser.add_argument('--print_progress', default=True, action='store_true', help='Print the progress of the OCR task')
170174
ocr_parser.add_argument('--storage_profile', type=str, default='default', help='Storage profile to use for the file')
171175
ocr_parser.add_argument('--storage_filename', type=str, default=None, help='Storage filename to use for the file. You may use some formatting - see the docs')
@@ -175,10 +179,11 @@ def main():
175179
ocr_parser = subparsers.add_parser('ocr', help='Upload a file to the OCR endpoint and get the result.')
176180
ocr_parser.add_argument('--file', type=str, default='examples/rmi-example.pdf', help='Path to the file to upload')
177181
ocr_parser.add_argument('--ocr_cache', default=True, action='store_true', help='Enable OCR result caching')
182+
ocr_parser.add_argument('--disable_ocr_cache', default=True, action='store_true', help='Disable OCR result caching')
178183
ocr_parser.add_argument('--prompt', type=str, default=None, help='Prompt used for the Ollama model to fix or transform the file')
179184
ocr_parser.add_argument('--prompt_file', default=None, type=str, help='Prompt file name used for the Ollama model to fix or transform the file')
180185
ocr_parser.add_argument('--model', type=str, default='llama3.1', help='Model to use for the Ollama endpoint')
181-
ocr_parser.add_argument('--strategy', type=str, default='marker', help='OCR strategy to use for the file')
186+
ocr_parser.add_argument('--strategy', type=str, default='llama_vision', help='OCR strategy to use for the file')
182187
ocr_parser.add_argument('--print_progress', default=True, action='store_true', help='Print the progress of the OCR task')
183188
ocr_parser.add_argument('--storage_profile', type=str, default='default', help='Storage profile to use for the file')
184189
ocr_parser.add_argument('--storage_filename', type=str, default=None, help='Storage filename to use for the file. You may use some formatting - see the docs')
@@ -189,10 +194,11 @@ def main():
189194
ocr_request_parser = subparsers.add_parser('ocr_request', help='Upload a file to the OCR endpoint via JSON and get the result.')
190195
ocr_request_parser.add_argument('--file', type=str, default='examples/rmi-example.pdf', help='Path to the file to upload')
191196
ocr_request_parser.add_argument('--ocr_cache', default=True, action='store_true', help='Enable OCR result caching')
197+
ocr_request_parser.add_argument('--disable_ocr_cache', default=True, action='store_true', help='Disable OCR result caching')
192198
ocr_request_parser.add_argument('--prompt', type=str, default=None, help='Prompt used for the Ollama model to fix or transform the file')
193199
ocr_request_parser.add_argument('--prompt_file', default=None, type=str, help='Prompt file name used for the Ollama model to fix or transform the file')
194200
ocr_request_parser.add_argument('--model', type=str, default='llama3.1', help='Model to use for the Ollama endpoint')
195-
ocr_request_parser.add_argument('--strategy', type=str, default='marker', help='OCR strategy to use')
201+
ocr_request_parser.add_argument('--strategy', type=str, default='llama_vision', help='OCR strategy to use')
196202
ocr_request_parser.add_argument('--print_progress', default=True, action='store_true', help='Print the progress of the OCR task')
197203
ocr_request_parser.add_argument('--storage_profile', type=str, default='default', help='Storage profile to use. You may use some formatting - see the docs')
198204
ocr_request_parser.add_argument('--storage_filename', type=str, default=None, help='Storage filename to use')
@@ -231,7 +237,7 @@ def main():
231237

232238
if args.command == 'ocr' or args.command == 'ocr_upload':
233239
print(args)
234-
result = ocr_upload(args.file, args.ocr_cache, args.prompt, args.prompt_file, args.model, args.strategy, args.storage_profile, args.storage_filename)
240+
result = ocr_upload(args.file, False if args.disable_ocr_cache else args.ocr_cache, args.prompt, args.prompt_file, args.model, args.strategy, args.storage_profile, args.storage_filename)
235241
if result is None:
236242
print("Error uploading file.")
237243
return
@@ -243,7 +249,7 @@ def main():
243249
if text_result:
244250
print(text_result)
245251
elif args.command == 'ocr_request':
246-
result = ocr_request(args.file, args.ocr_cache, args.prompt, args.prompt_file, args.model, args.strategy, args.storage_profile, args.storage_filename)
252+
result = ocr_request(args.file, False if args.disable_ocr_cache else args.ocr_cache, args.prompt, args.prompt_file, args.model, args.strategy, args.storage_profile, args.storage_filename)
247253
if result is None:
248254
print("Error uploading file.")
249255
return

docker-compose.gpu.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ services:
2525
- LIST_FILES_URL=${LIST_FILES_URL-http://localhost:8000/storage/list}
2626
- LOAD_FILE_URL=${LOAD_FILE_URL-http://localhost:8000/storage/load}
2727
- DELETE_FILE_URL=${DELETE_FILE_URL-http://localhost:8000/storage/delete}
28+
- LLAMA_VISION_PROMPT=${LLAMA_VISION_PROMPT-"You are OCR. Convert image to markdown."}
2829
depends_on:
2930
- redis
3031
- ollama
@@ -52,6 +53,7 @@ services:
5253
- LIST_FILES_URL=${LIST_FILES_URL-http://localhost:8000/storage/list}
5354
- LOAD_FILE_URL=${LOAD_FILE_URL-http://localhost:8000/storage/load}
5455
- DELETE_FILE_URL=${DELETE_FILE_URL-http://localhost:8000/storage/delete}
56+
- LLAMA_VISION_PROMPT=${LLAMA_VISION_PROMPT-"You are OCR. Convert image to markdown."}
5557
depends_on:
5658
- redis
5759
volumes:

docker-compose.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ services:
2222
- LIST_FILES_URL=${LIST_FILES_URL-http://localhost:8000/storage/list}
2323
- LOAD_FILE_URL=${LOAD_FILE_URL-http://localhost:8000/storage/load}
2424
- DELETE_FILE_URL=${DELETE_FILE_URL-http://localhost:8000/storage/delete}
25+
- LLAMA_VISION_PROMPT=${LLAMA_VISION_PROMPT-"You are OCR. Convert image to markdown."}
2526
depends_on:
2627
- redis
2728
- ollama
@@ -42,6 +43,7 @@ services:
4243
- LIST_FILES_URL=${LIST_FILES_URL-http://localhost:8000/storage/list}
4344
- LOAD_FILE_URL=${LOAD_FILE_URL-http://localhost:8000/storage/load}
4445
- DELETE_FILE_URL=${DELETE_FILE_URL-http://localhost:8000/storage/delete}
46+
- LLAMA_VISION_PROMPT=${LLAMA_VISION_PROMPT-"You are OCR. Convert image to markdown."}
4547
depends_on:
4648
- redis
4749
volumes:

run.sh

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,9 @@ ollama serve &
1212
echo "Pulling LLama3.1 model"
1313
ollama pull llama3.1
1414

15+
echo "Pulling LLama3.2-vision model"
16+
ollama pull llama3.2-vision
17+
1518
echo "Starting Redis"
1619
docker run -p 6379:6379 --restart always --detach redis &
1720

0 commit comments

Comments
 (0)