forked from t-redactyl/ocr-llm-agent
-
Notifications
You must be signed in to change notification settings - Fork 0
Installation Guide
Udit Asopa edited this page Oct 16, 2025
·
1 revision
This guide will walk you through installing Vision Text Extractor and setting up all the AI providers.
- Operating System: Linux, macOS, or Windows
- Python: 3.10+ (managed automatically by Pixi)
- Memory: 4GB RAM minimum, 8GB+ recommended for local models
- Storage: 3GB+ free space for all models
- Internet: Required for initial setup and cloud providers
Pixi is our dependency manager that makes installation seamless.
Linux/macOS:
curl -fsSL https://pixi.sh/install.sh | bashWindows (PowerShell):
iwr -useb https://pixi.sh/install.ps1 | iexAlternative methods: See pixi.sh
git clone https://github.com/udit-asopa/vision-text-extractor.git
cd vision-text-extractorpixi install# Copy environment template
pixi run setup-env
# Edit with your API keys (optional for local models)
nano .env # or use your preferred editorpixi run test-setup- β Completely local - no API keys needed
- β Privacy-focused - data never leaves your machine
- β Free to use - no usage limits
# Download SmolVLM model (~2GB)
pixi run setup-smolvlm
# Test it works
pixi run demo-ocr-huggingface- β Local model - privacy-focused
- β Different architecture - good for comparison
- β Free to use
# Install Ollama and LLaVA model
pixi run setup-ollama
# Test it works
pixi run demo-ocr-ollama- β Highest accuracy - state-of-the-art performance
- β Requires API key - costs money per request
- β Internet required - data sent to OpenAI
# Get API key from https://platform.openai.com/api-keys
# Add to .env file:
echo "OPENAI_API_KEY=your_key_here" >> .env
# Test it works
pixi run demo-ocr-openai# Check dependencies
pixi run test-imports
# Test individual components
pixi run test-components
# Complete setup verification
pixi run setup# Test with sample image
pixi run demo-ocr-huggingface
# Test with your own image
python main.py path/to/your/image.jpg# Enter development mode with Jupyter
pixi shell -e dev
jupyter labFor faster SmolVLM processing with CUDA-compatible GPUs:
- Ensure NVIDIA drivers are installed
- CUDA will be automatically detected
- Model will run on GPU if available
# Set custom Hugging Face cache directory
export HF_HOME=/path/to/custom/cache
pixi run setup-smolvlmPixi not found:
# Add to your shell profile (.bashrc, .zshrc, etc.)
export PATH="$HOME/.pixi/bin:$PATH"
source ~/.bashrc # or restart terminalPermission errors:
# On Linux/macOS, ensure execute permissions
chmod +x ~/.pixi/bin/pixiOut of memory with SmolVLM:
# Use smaller batch size or switch to Ollama
pixi run demo-ocr-ollamaOpenAI API errors:
- Check API key is correct in
.env - Verify billing is set up on OpenAI account
- Check rate limits
- Check Troubleshooting page
- Review GitHub Issues
- Create new issue if problem persists
After successful installation:
- π Read Quick Start Tutorial
- π― Try Basic Usage examples
- π Explore Use Case Tutorials
Installation complete! Ready to extract text from images. π