feat(eval): RAGAS evaluation framework + RAG pipeline improvements f8b04c3 adeshboudh16 Claude Sonnet 4.6 commited on 9 days ago
fix: improve RAG retrieval quality and reduce generator hallucination 98da9ee adeshboudh16 Claude Sonnet 4.6 commited on 18 days ago
fix: use LiteLLMClient adapter for Gemini judge in build_judge f30a39a adeshboudh16 Claude Sonnet 4.6 commited on 19 days ago
feat: add Gemini native judge backend to eval pipeline e56b2e9 adeshboudh16 Claude Sonnet 4.6 commited on 19 days ago
fix: remove dead results list and duplicate load_dotenv in run_eval.py 6dd7d30 adeshboudh16 Claude Sonnet 4.6 commited on 21 days ago
feat: split run_eval.py into checkpointed phases with no_reasoning support a7c976e adeshboudh16 commited on 21 days ago
fix: EVAL_PRIMARY_MODEL routes graph through osmapi via LiteLLM openai-compat provider af81429 adeshboudh16 commited on 21 days ago
feat: EVAL_PRIMARY_MODEL override for graph LLM, applied before civicsetu import c8b183a adeshboudh16 commited on 21 days ago
feat: new run_eval.py β single-pass no-phases, 3 batch_score calls total via osmapi a84bdca adeshboudh16 commited on 21 days ago
refactor: simplify phase 2 to sequential row-by-row scoring, comment out batch/sleep/thread code 75fe6a6 adeshboudh16 commited on 21 days ago
chore: comment out inter-metric rate-limit sleep (not needed for osmapi) f57ca63 adeshboudh16 commited on 21 days ago
feat: switch to Ollama embeddings, comment out 60s rate-limit sleeps for osmapi 61efc39 adeshboudh16 commited on 21 days ago
feat: switch eval judge to osmapi / qwen3.5-122b-a10b, keep Gemini embeddings d5b414f adeshboudh16 commited on 21 days ago
fix: remove dual-key parallel mode β always use single worker with GEMINI_API_KEY_2 b9de807 adeshboudh16 commited on 21 days ago
fix: share single genai.Client across workers to prevent global auth state conflict d6cf406 adeshboudh16 commited on 21 days ago
fix: log batch start in score_batch_in_thread for visibility into thread pickup 80a155c adeshboudh16 commited on 22 days ago
fix: add 120s per-call timeout to Gemini/OpenRouter client + 5-min per-batch deadline 27711f2 adeshboudh16 commited on 22 days ago
fix: rate-limit-aware scoring β sleep between metric calls + BATCH_SIZE=1 default 4817cb3 adeshboudh16 Claude Sonnet 4.6 commited on 26 days ago
feat: add OpenRouter judge provider to sidestep Gemini 15 RPM free-tier limit dba614f adeshboudh16 Claude Sonnet 4.6 commited on 26 days ago
fix: raise max_tokens to 8192 for RAGAS judge LLM 899cd4f adeshboudh16 Claude Sonnet 4.6 commited on 26 days ago
fix: RAGAS 0.4.x native API + phase separation (eval-collect / eval-score) 6ce46d8 adeshboudh16 Claude Sonnet 4.6 commited on 26 days ago
fix: switch RAGAS judge to LiteLLM + instructor.from_litellm 0ff6975 adeshboudh16 Claude Sonnet 4.6 commited on 27 days ago
fix: use new google-genai SDK in build_judge() 1a2b117 adeshboudh16 Claude Sonnet 4.6 commited on 27 days ago
fix: RAGAS 0.4.x native API + phase separation (eval-collect / eval-score) 997e371 adeshboudh16 commited on 27 days ago
fix: pass llm/embeddings to RAGAS 0.4.x metric constructors 65e6ec2 adeshboudh16 commited on 27 days ago
eval: update RAGAS metric imports to 0.4.x API 71219a9 adeshboudh16 Claude Sonnet 4.6 commited on 27 days ago
eval: fix runner script quality issues (RAGAS column_map, NaN handling, key validation) 50379a5 adeshboudh16 Claude Sonnet 4.6 commited on 27 days ago