Fix #9: Update benchmark runner for 3-pipeline comparison + LLM-as-a-Judge + BERTScore evaluation ddfbb09 verified muthuk1 commited on 8 days ago