Spaces:

smolagents
/

ml-intern

Running on CPU Upgrade

akseljoonas HF Staff commited on Nov 25, 2025

Commit

f00b1a6

1 Parent(s): bacafa4

link fix

Files changed (1) hide show

eval/README.md CHANGED Viewed

@@ -1,11 +1,11 @@
 # HF-Agent Eval
-Rubric-based evaluation pipeline implementing [Rubrics as Rewards](https://arxiv.org/abs/2410.13254) (RaR-Explicit).
 ## Pipeline
 ```
-QA pairs → generate_rubrics.py → `eval/task.py@hf-benchmark-with-rubrics` → scores
 ```
 ### 1. Generate Rubrics (if not already generated)
@@ -27,9 +27,7 @@ python eval/generate_rubrics.py \
 **Output:** 7-20 weighted criteria per question (Essential: +5, Important: +3-4, Optional: +1-2, Pitfall: -1 to -2)
-### 2. Evaluate Responses (Inspect)
-Load your rubric dataset, run a solver, and score with `rubric_scorer` using `inspect-ai`.
 Files:
 - `eval/hf_agent_connector.py` contains a lightweight bridge that spins up

 # HF-Agent Eval
+Rubric-based evaluation pipeline implementing [Rubrics as Rewards](https://arxiv.org/abs/2507.17746) paper (RaR-Explicit formula).
 ## Pipeline
 ```
+QA pairs → generate_rubrics.py → run `inspect-ai eval eval/task.py@hf-benchmark-with-rubrics` → scores
 ```
 ### 1. Generate Rubrics (if not already generated)
 **Output:** 7-20 weighted criteria per question (Essential: +5, Important: +3-4, Optional: +1-2, Pitfall: -1 to -2)
+### 2. Response evaluation
 Files:
 - `eval/hf_agent_connector.py` contains a lightweight bridge that spins up