Skip to content

Installation Guide

Udit Asopa edited this page Oct 16, 2025 · 1 revision

Installation Guide

This guide will walk you through installing Vision Text Extractor and setting up all the AI providers.

πŸ“‹ Prerequisites

System Requirements

  • Operating System: Linux, macOS, or Windows
  • Python: 3.10+ (managed automatically by Pixi)
  • Memory: 4GB RAM minimum, 8GB+ recommended for local models
  • Storage: 3GB+ free space for all models
  • Internet: Required for initial setup and cloud providers

Install Pixi

Pixi is our dependency manager that makes installation seamless.

Linux/macOS:

curl -fsSL https://pixi.sh/install.sh | bash

Windows (PowerShell):

iwr -useb https://pixi.sh/install.ps1 | iex

Alternative methods: See pixi.sh

πŸš€ Quick Installation

Step 1: Clone Repository

git clone https://github.com/udit-asopa/vision-text-extractor.git
cd vision-text-extractor

Step 2: Install Dependencies

pixi install

Step 3: Environment Setup

# Copy environment template
pixi run setup-env

# Edit with your API keys (optional for local models)
nano .env  # or use your preferred editor

Step 4: Validate Installation

pixi run test-setup

πŸ€– AI Provider Setup

Option 1: Hugging Face SmolVLM (Recommended for Beginners)

  • βœ… Completely local - no API keys needed
  • βœ… Privacy-focused - data never leaves your machine
  • βœ… Free to use - no usage limits
# Download SmolVLM model (~2GB)
pixi run setup-smolvlm

# Test it works
pixi run demo-ocr-huggingface

Option 2: Ollama LLaVA (Local Alternative)

  • βœ… Local model - privacy-focused
  • βœ… Different architecture - good for comparison
  • βœ… Free to use
# Install Ollama and LLaVA model
pixi run setup-ollama

# Test it works
pixi run demo-ocr-ollama

Option 3: OpenAI GPT-4o (Cloud-based)

  • βœ… Highest accuracy - state-of-the-art performance
  • ❌ Requires API key - costs money per request
  • ❌ Internet required - data sent to OpenAI
# Get API key from https://platform.openai.com/api-keys
# Add to .env file:
echo "OPENAI_API_KEY=your_key_here" >> .env

# Test it works
pixi run demo-ocr-openai

βœ… Verification

Test All Components

# Check dependencies
pixi run test-imports

# Test individual components
pixi run test-components

# Complete setup verification
pixi run setup

Quick Functionality Test

# Test with sample image
pixi run demo-ocr-huggingface

# Test with your own image
python main.py path/to/your/image.jpg

πŸ”§ Advanced Installation

Development Environment

# Enter development mode with Jupyter
pixi shell -e dev
jupyter lab

GPU Acceleration (Optional)

For faster SmolVLM processing with CUDA-compatible GPUs:

  1. Ensure NVIDIA drivers are installed
  2. CUDA will be automatically detected
  3. Model will run on GPU if available

Custom Model Locations

# Set custom Hugging Face cache directory
export HF_HOME=/path/to/custom/cache
pixi run setup-smolvlm

🚨 Troubleshooting

Common Issues

Pixi not found:

# Add to your shell profile (.bashrc, .zshrc, etc.)
export PATH="$HOME/.pixi/bin:$PATH"
source ~/.bashrc  # or restart terminal

Permission errors:

# On Linux/macOS, ensure execute permissions
chmod +x ~/.pixi/bin/pixi

Out of memory with SmolVLM:

# Use smaller batch size or switch to Ollama
pixi run demo-ocr-ollama

OpenAI API errors:

  • Check API key is correct in .env
  • Verify billing is set up on OpenAI account
  • Check rate limits

Getting Help

➑️ Next Steps

After successful installation:

  1. πŸ“– Read Quick Start Tutorial
  2. 🎯 Try Basic Usage examples
  3. πŸ” Explore Use Case Tutorials

Installation complete! Ready to extract text from images. πŸŽ‰