forked from t-redactyl/ocr-llm-agent
-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Udit Asopa edited this page Oct 16, 2025
·
1 revision
Welcome to the Vision Text Extractor wiki! This comprehensive guide will help you get the most out of this multi-provider OCR tool.
- Installation Guide - Complete setup instructions
- Quick Start Tutorial - Get running in 5 minutes
- Configuration - Environment setup and API keys
- Basic Usage - Command-line interface basics
- Advanced Features - Custom prompts, batch processing
- Provider Comparison - Choosing the right AI model
- Document Processing - Business documents, forms, receipts
- Recipe Extraction - Cooking and food industry use cases
- Academic Research - Educational and research applications
- API Reference - Function documentation
- Pixi Tasks - Complete task reference
- Troubleshooting - Common issues and solutions
- Contributing - How to contribute to the project
- Architecture - Technical architecture overview
- Adding New Providers - Extend with more AI models
Vision Text Extractor is a powerful, multi-provider OCR (Optical Character Recognition) tool that uses state-of-the-art vision-language models to extract text from images and documents.
- 3 AI Providers: Choose from Hugging Face SmolVLM (local), Ollama LLaVA (local), or OpenAI GPT-4o (cloud)
- Flexible Input: Support for local files and web URLs
- Custom Prompts: Extract specific information with targeted prompts
- Professional CLI: Built with Typer for excellent user experience
- Easy Setup: Managed dependencies with Pixi
- π Business Documents: Contracts, invoices, forms
- π½οΈ Food Industry: Recipes, menus, nutrition labels
- π° Finance: Receipts, bank statements, tax documents
- π Education: Homework, research papers, lecture notes
- π₯ Healthcare: Prescriptions, lab results, medical forms
- π Real Estate: Property listings, lease agreements
# Install and setup
git clone https://github.com/udit-asopa/vision-text-extractor.git
cd vision-text-extractor
pixi install
pixi run setup
# Extract text from an image
pixi run demo-ocr-huggingface
python main.py path/to/your/image.jpg
# Custom extraction
python main.py receipt.jpg --prompt "Extract total amount and date"New Users: Start with Installation Guide β Quick Start Tutorial
Power Users: Jump to Advanced Features β API Reference
Developers: Check out Architecture β Contributing
Last updated: October 2025