muthuk1's picture
Fix #3: Add LLM-as-a-Judge (PASS/FAIL) + BERTScore evaluation โ€” the two hackathon-required accuracy metrics
f486777 verified