Skip to content

Commit da03a37

Browse files
authored
Merge pull request #110 from janoberst/pr110
Fixing minor spelling issues
2 parents 090924e + 3bd883e commit da03a37

File tree

1 file changed

+13
-13
lines changed

1 file changed

+13
-13
lines changed

README.md

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ The API is built with FastAPI and uses Celery for asynchronous task processing.
1212
- **PDF/Office to JSON** conversion using Ollama supported models (eg. LLama 3.1)
1313
- **LLM Improving OCR results** LLama is pretty good with fixing spelling and text issues in the OCR text
1414
- **Removing PII** This tool can be used for removing Personally Identifiable Information out of document - see `examples`
15-
- **Distributed queue processing** using [Celery](https://docs.celeryq.dev/en/stable/getting-started/introduction.html))
15+
- **Distributed queue processing** using [Celery](https://docs.celeryq.dev/en/stable/getting-started/introduction.html)
1616
- **Caching** using Redis - the OCR results can be easily cached prior to LLM processing,
1717
- **Storage Strategies** switchable storage strategies (Google Drive, Local File System ...)
1818
- **CLI tool** for sending tasks and processing results
@@ -163,22 +163,22 @@ python client/cli.py ocr_upload --file examples/example-mri.pdf --ocr_cache --pr
163163
In case of any questions, help requests or just feedback - please [join us on Discord](https://discord.gg/NJzu47Ye3a)!
164164
165165
166-
## Text extract stratgies
166+
## Text extract strategies
167167
168168
### `easyocr`
169169
170-
Easy OCR is avaialble on Apache based license. It's general purpose OCR with support for more than 30 langues, probably with the best performance for English.
170+
Easy OCR is available on Apache based license. It's general purpose OCR with support for more than 30 languages, probably with the best performance for English.
171171
172172
Enabled by default. Please do use the `strategy=easyocr` CLI and URL parameters to use it.
173173
174174
175175
### `minicpm-v`
176176
177-
MiniCPM-V is Apache based licenseed OCR strategy.
177+
MiniCPM-V is an Apache based licensed OCR strategy.
178178
179179
The usage of MiniCPM-o/V model weights must strictly follow [MiniCPM Model License.md](https://github.com/OpenBMB/MiniCPM/blob/main/MiniCPM%20Model%20License.md).
180180
181-
The models and weights of MiniCPM are completely free for academic research. after filling out a ["questionnaire"](https://modelbest.feishu.cn/share/base/form/shrcnpV5ZT9EJ6xYjh3Kx0J6v8g) for registration, are also available for free commercial use.
181+
The models and weights of MiniCPM are completely free for academic research. After filling out a ["questionnaire"](https://modelbest.feishu.cn/share/base/form/shrcnpV5ZT9EJ6xYjh3Kx0J6v8g) for registration, are also available for free commercial use.
182182
183183
Enabled by default. Please do use the `strategy=minicpm_v` CLI and URL parameters to use it.
184184
@@ -254,7 +254,7 @@ cd text-extract-api
254254
```
255255
256256
### Using `Makefile`
257-
You can use the `make install` and `make run` command to setup the Docker environment for `text-extract-api`. You can find the manual steps required to do so described below.
257+
You can use the `make install` and `make run` commands to set up the Docker environment for `text-extract-api`. You can find the manual steps required to do so described below.
258258
259259
260260
### Manual setup
@@ -286,9 +286,9 @@ OCR_URL=http://localhost:8000/ocr/upload
286286
OCR_UPLOAD_URL=http://localhost:8000/ocr/upload
287287
OCR_REQUEST_URL=http://localhost:8000/ocr/request
288288
RESULT_URL=http://localhost:8000/ocr/result/
289-
CLEAR_CACHE_URL=http://localhost:8000/ocr/clear_cach
289+
CLEAR_CACHE_URL=http://localhost:8000/ocr/clear_cache
290290
LLM_PULL_API_URL=http://localhost:8000/llm_pull
291-
LLM_GENEREATE_API_URL=http://localhost:8000/llm_generate
291+
LLM_GENERATE_API_URL=http://localhost:8000/llm_generate
292292
293293
CELERY_BROKER_URL=redis://localhost:6379/0
294294
CELERY_RESULT_BACKEND=redis://localhost:6379/0
@@ -297,7 +297,7 @@ APP_ENV=development # Default to development mode
297297
```
298298
299299
300-
**Note:** In order to properly save the output files you might need to modify `storage_profiles/default.yaml` to change the default storage path according to the volumes path defined in the `docker-compose.yml`
300+
**Note:** In order to properly save the output files, you might need to modify `storage_profiles/default.yaml` to change the default storage path according to the volumes path defined in the `docker-compose.yml`
301301
302302
### Build and Run the Docker Containers
303303
@@ -338,7 +338,7 @@ pip install -e . # install main project requirements
338338
```
339339
340340
341-
The project includes a CLI for interacting with the API. To make it work first run:
341+
The project includes a CLI for interacting with the API. To make it work, first run:
342342
343343
```bash
344344
cd client
@@ -377,7 +377,7 @@ The difference is just that the first call uses `ocr/upload` - multipart form da
377377
378378
**Important note:** To use LLM you must first run the **llm_pull** to get the specific model required by your requests.
379379
380-
For example you must run:
380+
For example, you must run:
381381
382382
```bash
383383
python client/cli.py llm_pull --model llama3.1
@@ -453,7 +453,7 @@ python llm_generate --prompt "Your prompt here"
453453
454454
## API Clients
455455
456-
You might want to use the decdicated API clients to use `text-extract-api`
456+
You might want to use the dedicated API clients to use `text-extract-api`.
457457
458458
### Typescript
459459
@@ -472,7 +472,7 @@ const formData = new FormData();
472472
formData.append('file', fileInput.files[0]);
473473
formData.append('prompt', 'Convert file to JSON and return only JSON'); // if not provided, no LLM transformation will gonna happen - just the OCR
474474
formData.append('strategy', 'llama_vision');
475-
formData.append('model', 'llama3.1')
475+
formData.append('model', 'llama3.1');
476476
formData.append('ocr_cache', 'true');
477477
478478
apiClient.uploadFile(formData).then(response => {

0 commit comments

Comments
 (0)