added download model script download.py - Adithya S K

adithya-s-k · adithya-s-k · commit 939522a275d4 · 2024-07-05T13:30:26.000+05:30
diff --git a/README.md b/README.md
@@ -95,6 +95,17 @@ python server.py --host 0.0.0.0 --port 8000 --documents --media --web
 - `--media`: Load in Whisper model to transcribe audio and video files.
 - `--web`: Set up selenium crawler.
 
+Download Models:
+If you want to download the models before starting the server
+
+```bash
+python download.py --documents --media --web
+```
+
+- `--documents`: Load in all the models that help you parse and ingest documents (Surya OCR series of models and Florence-2).
+- `--media`: Load in Whisper model to transcribe audio and video files.
+- `--web`: Set up selenium crawler.
+
 ## Supported Data Types
 
 | Type      | Supported Extensions                                |
@@ -280,14 +291,16 @@ Arguments:
 ## Limitations
 There is a need for a GPU with 8~10 GB minimum VRAM as we are using deep learning models.
 \
+
 Document Parsing Limitations
 \
-[Marker](https://github.com/VikParuchuri/marker) which is the underlying PDF parser will not convert 100% of equations to LaTeX because it has to detect and then convert them.
-Tables are not always formatted 100% correctly; text can be in the wrong column.
-Whitespace and indentations are not always respected.
-Not all lines/spans will be joined properly.
-This works best on digital PDFs that won't require a lot of OCR. It's optimized for speed, and limited OCR is used to fix errors.
-To fit all the models in the GPU, we are using the smallest variants, which might not offer the best-in-class performance.
+- [Marker](https://github.com/VikParuchuri/marker) which is the underlying PDF parser will not convert 100% of equations to LaTeX because it has to detect and then convert them.
+- It is good at parsing english but might struggle for languages such as Chinese
+- Tables are not always formatted 100% correctly; text can be in the wrong column.
+- Whitespace and indentations are not always respected.
+- Not all lines/spans will be joined properly.
+- This works best on digital PDFs that won't require a lot of OCR. It's optimized for speed, and limited OCR is used to fix errors.
+- To fit all the models in the GPU, we are using the smallest variants, which might not offer the best-in-class performance.
 
 ## License
 OmniParse is licensed under the GPL-3.0 license. See `LICENSE` for more information.
diff --git a/download.py b/download.py
@@ -0,0 +1,21 @@
+"""
+Script to download models
+"""
+import argparse
+from omniparse import load_omnimodel
+
+def download_models():
+    
+    parser = argparse.ArgumentParser(description="Download models for omniparse")
+    
+    parser.add_argument("--documents", action='store_true', help="Load document models")
+    parser.add_argument("--media", action='store_true', help="Load media models")
+    parser.add_argument("--web", action='store_true', help="Load web models")
+    args = parser.parse_args()
+    
+    
+    load_omnimodel(args.documents, args.media, args.web)
+    
+
+if __name__ == '__main__':
+    download_models()
diff --git a/server.py b/server.py
@@ -34,7 +34,7 @@ def add(app: FastAPI):
 
 def main():
     # Parse command-line arguments
-    parser = argparse.ArgumentParser(description="Run the marker-api server.")
+    parser = argparse.ArgumentParser(description="Run the omniparse server.")
     parser.add_argument("--host", default="0.0.0.0", help="Host IP address")
     parser.add_argument("--port", type=int, default=8000, help="Port number")
     parser.add_argument("--documents", action='store_true', help="Load document models")