-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Labels
good first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is neededoptimizationperformance
Description
📊 Current Performance
StillMe currently has ~40s latency for complex questions:
| Component | Time | Notes |
|---|---|---|
| RAG Retrieval | 0.36s | ChromaDB semantic search |
| LLM Inference | 6.43s | DeepSeek/OpenAI API |
| Post-processing | 33.57s | Quality evaluation + rewrite + philosophical depth |
| Total | ~40s | End-to-end latency |
🎯 Goal
Reduce total latency to <10s for complex questions while maintaining:
- ✅ Quality and philosophical depth
- ✅ Zero-tolerance hallucination policy
- ✅ Complete transparency and citation
🔍 Analysis
Post-processing is the bottleneck (83% of total time):
- Quality evaluation: Rule-based (fast)
- Rewrite engine: LLM-based (slow - multiple passes)
- Philosophical depth: LLM-based (slow - deep analysis)
Potential optimizations:
- Parallel rewrite passes - Run rewrite 1 & 2 in parallel where possible
- Caching rewrite results - Cache common patterns and templates
- Conditional rewrite - Skip rewrite for high-quality initial responses
- Streaming response - Return partial results while processing
- Batch processing - Process multiple validations in parallel
- Optimize LLM calls - Reduce token count, use faster models for simple tasks
💡 Proposed Solutions
Phase 1: Quick Wins (Target: 20-25s)
- Implement conditional rewrite (only rewrite when necessary)
- Cache rewrite templates for common patterns
- Optimize prompt length to reduce LLM inference time
Phase 2: Parallel Processing (Target: 10-15s)
- Parallel rewrite passes where independent
- Batch validation checks
- Stream response to user
Phase 3: Advanced Optimization (Target: <10s)
- Use faster models for simple tasks
- Implement response streaming
- Optimize RAG retrieval with better indexing
🤝 How to Contribute
- Profile the code - Identify exact bottlenecks
- Propose optimizations - Share ideas in comments
- Submit PRs - Implement optimizations with tests
- Test performance - Measure improvements
📝 Notes
This is a conscious trade-off - we prioritize quality and philosophical depth over speed. However, we believe we can achieve both with the right optimizations!
Related:
- See
docs/PAPER.mdSection 4.6 for current performance analysis - See
docs/FAQ.mdfor performance questions - See
backend/postprocessing/for current implementation
Metadata
Metadata
Assignees
Labels
good first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is neededoptimizationperformance