Skip to content
Udit Asopa edited this page Oct 16, 2025 · 1 revision

Vision Text Extractor Wiki

Welcome to the Vision Text Extractor wiki! This comprehensive guide will help you get the most out of this multi-provider OCR tool.

πŸš€ Quick Navigation

Getting Started

Usage Guides

Tutorials

Technical Reference

Development

🌟 What is Vision Text Extractor?

Vision Text Extractor is a powerful, multi-provider OCR (Optical Character Recognition) tool that uses state-of-the-art vision-language models to extract text from images and documents.

Key Features

  • 3 AI Providers: Choose from Hugging Face SmolVLM (local), Ollama LLaVA (local), or OpenAI GPT-4o (cloud)
  • Flexible Input: Support for local files and web URLs
  • Custom Prompts: Extract specific information with targeted prompts
  • Professional CLI: Built with Typer for excellent user experience
  • Easy Setup: Managed dependencies with Pixi

Use Cases

  • πŸ“„ Business Documents: Contracts, invoices, forms
  • 🍽️ Food Industry: Recipes, menus, nutrition labels
  • πŸ’° Finance: Receipts, bank statements, tax documents
  • πŸ“š Education: Homework, research papers, lecture notes
  • πŸ₯ Healthcare: Prescriptions, lab results, medical forms
  • 🏠 Real Estate: Property listings, lease agreements

🎯 Quick Example

# Install and setup
git clone https://github.com/udit-asopa/vision-text-extractor.git
cd vision-text-extractor
pixi install
pixi run setup

# Extract text from an image
pixi run demo-ocr-huggingface
python main.py path/to/your/image.jpg

# Custom extraction
python main.py receipt.jpg --prompt "Extract total amount and date"

πŸ“š Learning Path

New Users: Start with Installation Guide β†’ Quick Start Tutorial

Power Users: Jump to Advanced Features β†’ API Reference

Developers: Check out Architecture β†’ Contributing


Last updated: October 2025

Clone this wiki locally