Home

Vision Text Extractor Wiki

Welcome to the Vision Text Extractor wiki! This comprehensive guide will help you get the most out of this multi-provider OCR tool.

🚀 Quick Navigation

Getting Started

Installation Guide - Complete setup instructions
Quick Start Tutorial - Get running in 5 minutes
Configuration - Environment setup and API keys

Usage Guides

Basic Usage - Command-line interface basics
Advanced Features - Custom prompts, batch processing
Provider Comparison - Choosing the right AI model

Tutorials

Document Processing - Business documents, forms, receipts
Recipe Extraction - Cooking and food industry use cases
Academic Research - Educational and research applications

Technical Reference

API Reference - Function documentation
Pixi Tasks - Complete task reference
Troubleshooting - Common issues and solutions

Development

Contributing - How to contribute to the project
Architecture - Technical architecture overview
Adding New Providers - Extend with more AI models

🌟 What is Vision Text Extractor?

Vision Text Extractor is a powerful, multi-provider OCR (Optical Character Recognition) tool that uses state-of-the-art vision-language models to extract text from images and documents.

Key Features

3 AI Providers: Choose from Hugging Face SmolVLM (local), Ollama LLaVA (local), or OpenAI GPT-4o (cloud)
Flexible Input: Support for local files and web URLs
Custom Prompts: Extract specific information with targeted prompts
Professional CLI: Built with Typer for excellent user experience
Easy Setup: Managed dependencies with Pixi

Use Cases

📄 Business Documents: Contracts, invoices, forms
🍽️ Food Industry: Recipes, menus, nutrition labels
💰 Finance: Receipts, bank statements, tax documents
📚 Education: Homework, research papers, lecture notes
🏥 Healthcare: Prescriptions, lab results, medical forms
🏠 Real Estate: Property listings, lease agreements

🎯 Quick Example

# Install and setup
git clone https://github.com/udit-asopa/vision-text-extractor.git
cd vision-text-extractor
pixi install
pixi run setup

# Extract text from an image
pixi run demo-ocr-huggingface
python main.py path/to/your/image.jpg

# Custom extraction
python main.py receipt.jpg --prompt "Extract total amount and date"

📚 Learning Path

New Users: Start with Installation Guide → Quick Start Tutorial

Power Users: Jump to Advanced Features → API Reference

Developers: Check out Architecture → Contributing

Last updated: October 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Home

Vision Text Extractor Wiki

🚀 Quick Navigation

Getting Started

Usage Guides

Tutorials

Technical Reference

Development

🌟 What is Vision Text Extractor?

Key Features

Use Cases

🎯 Quick Example

📚 Learning Path

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally