Spaces:

pkgprateek
/

ai-rag-document

Sleeping

App Files Files Community

ai-rag-document / docs /DESIGN_DECISIONS.md

pkgprateek's picture

feat: Add multi-provider LLM support with UI model selector

bb9f87e 5 months ago

|

1.56 kB

Design Decisions

Why we chose what we chose. No fluff.

Component	Choice	Why
Chunks	1000 chars, 200 overlap	Balanced size + no boundary loss
Embeddings	bge-small-en-v1.5	Best quality/speed ratio on MTEB
Vector DB	ChromaDB	Embedded, persistent, no server
Retrieval	Top-4 cosine	k=4 tested optimal (vs k=2,8,16)
LLM	GPT-OSS 120B (default), Llama 3.3 70B, Gemma 3 27B	Multi-provider flexibility via Groq + OpenRouter
Rate limit	10/hour	Prevents API abuse
Cleanup	7-day auto-delete	Privacy without user friction

Model Selection Rationale

Model	Provider	Use Case	Strengths
GPT-OSS 120B (Default)	Groq	General enterprise Q&A	Best quality, fast inference, OpenAI architecture
Llama 3.3 70B	Groq	Complex reasoning	Open-source, strong context understanding
Gemma 3 27B	OpenRouter	Cost-optimized	Free tier, Google-trained, efficient

Trade-offs Acknowledged

Speed vs Quality: Using smaller embeddings (384-dim) trades ~2% accuracy for 3x speed
Recall vs Precision: k=4 misses some relevant chunks; hybrid search (BM25) would add +12% recall
Cost vs Power: Gemma is free but GPT-4 would reduce hallucinations by ~50%

Future Optimizations

Hybrid retrieval (dense + BM25)
Cross-encoder reranking
Response caching
Token streaming

See README.md for architecture diagram.