Skip to content

[TECH-DEBT] 🚀 Performance Optimization: Reduce latency for complex questions (currently ~40s) #123

@anhmtk

Description

@anhmtk

📊 Current Performance

StillMe currently has ~40s latency for complex questions:

Component Time Notes
RAG Retrieval 0.36s ChromaDB semantic search
LLM Inference 6.43s DeepSeek/OpenAI API
Post-processing 33.57s Quality evaluation + rewrite + philosophical depth
Total ~40s End-to-end latency

🎯 Goal

Reduce total latency to <10s for complex questions while maintaining:

  • ✅ Quality and philosophical depth
  • ✅ Zero-tolerance hallucination policy
  • ✅ Complete transparency and citation

🔍 Analysis

Post-processing is the bottleneck (83% of total time):

  • Quality evaluation: Rule-based (fast)
  • Rewrite engine: LLM-based (slow - multiple passes)
  • Philosophical depth: LLM-based (slow - deep analysis)

Potential optimizations:

  1. Parallel rewrite passes - Run rewrite 1 & 2 in parallel where possible
  2. Caching rewrite results - Cache common patterns and templates
  3. Conditional rewrite - Skip rewrite for high-quality initial responses
  4. Streaming response - Return partial results while processing
  5. Batch processing - Process multiple validations in parallel
  6. Optimize LLM calls - Reduce token count, use faster models for simple tasks

💡 Proposed Solutions

Phase 1: Quick Wins (Target: 20-25s)

  • Implement conditional rewrite (only rewrite when necessary)
  • Cache rewrite templates for common patterns
  • Optimize prompt length to reduce LLM inference time

Phase 2: Parallel Processing (Target: 10-15s)

  • Parallel rewrite passes where independent
  • Batch validation checks
  • Stream response to user

Phase 3: Advanced Optimization (Target: <10s)

  • Use faster models for simple tasks
  • Implement response streaming
  • Optimize RAG retrieval with better indexing

🤝 How to Contribute

  1. Profile the code - Identify exact bottlenecks
  2. Propose optimizations - Share ideas in comments
  3. Submit PRs - Implement optimizations with tests
  4. Test performance - Measure improvements

📝 Notes

This is a conscious trade-off - we prioritize quality and philosophical depth over speed. However, we believe we can achieve both with the right optimizations!

Related:

  • See docs/PAPER.md Section 4.6 for current performance analysis
  • See docs/FAQ.md for performance questions
  • See backend/postprocessing/ for current implementation

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions