|
10 | 10 |
|
11 | 11 | > [!IMPORTANT] |
12 | 12 | > |
13 | | ->OmniParse is a platform that ingests/parses any unstructured data into structured, actionable data optimized for GenAI (LLM) applcaitons. Whether working with documents, tables, images, videos, audio files, or web pages, OmniParse ensures your data is clean and ready for GenAI applications, like RAG fineutning etc. |
| 13 | +>OmniParse is a platform that ingests/parses any unstructured data into structured, actionable data optimized for GenAI (LLM) applcaitons. Whether working with documents, tables, images, videos, audio files, or web pages, OmniParse prepares your data to be clean, structured and ready for AI applications, such as RAG , fine-tuning and more. |
14 | 14 |
|
15 | 15 |
|
16 | 16 |
|
|
29 | 29 | ### Problem Statement |
30 | 30 | It's challenging to process data as it comes in different shapes and sizes. OmniParse aims to be an ingestion/parsing platform where you can ingest any type of data, such as documents, images, audio, video, and web content, and get the most structured and actionable output that is GenAI (LLM) friendly. |
31 | 31 |
|
32 | | -## Supported Types |
33 | | - |
34 | | -| Type | Supported Extensions | |
35 | | -|-----------|-----------------------------------------------------| |
36 | | -| Documents | .doc, .docx, .odt, .pdf, .ppt, .pptx | |
37 | | -| Images | .png, .jpg, .jpeg, .tiff, .bmp, .heic | |
38 | | -| Video | .mp4, .mkv, .avi, .mov | |
39 | | -| Audio | .mp3, .wav, .aac | |
40 | | -| Web | dynamic webpages, http://<anything>.com | |
41 | | - |
42 | 32 | ## Installation |
43 | 33 | > Note: The server only works on Linux-based systems. This is due to certain dependencies and system-specific configurations that are not compatible with Windows or macOS. |
44 | 34 | To install OmniParse, you can use `pip`: |
@@ -113,14 +103,22 @@ Arguments: |
113 | 103 | - `--host`: Host IP address (default: 0.0.0.0) |
114 | 104 | - `--port`: Port number (default: 8000) |
115 | 105 |
|
| 106 | +## Supported Data Types |
| 107 | + |
| 108 | +| Type | Supported Extensions | |
| 109 | +|-----------|-----------------------------------------------------| |
| 110 | +| Documents | .doc, .docx, .odt, .pdf, .ppt, .pptx | |
| 111 | +| Images | .png, .jpg, .jpeg, .tiff, .bmp, .heic | |
| 112 | +| Video | .mp4, .mkv, .avi, .mov | |
| 113 | +| Audio | .mp3, .wav, .aac | |
| 114 | +| Web | dynamic webpages, http://<anything>.com | |
116 | 115 |
|
117 | 116 |
|
118 | 117 | <details> |
119 | 118 | <summary><h2>API Endpoints</h></summary> |
120 | 119 |
|
121 | | -> [!IMPORTANT] |
122 | | -> |
123 | 120 | > Client library compatible with Langchain, llamaindex, and haystack integrations coming soon. |
| 121 | +
|
124 | 122 | - [API Endpoints](#api-endpoints) |
125 | 123 | - [Document Parsing](#document-parsing) |
126 | 124 | - [Parse Any Document](#parse-any-document) |
|
0 commit comments