Spaces:
Running
Running
Commit History
Sync Web UI [END] logs to use the exact formal score= specification format f63920a
Final strict spec-compliance polish: score precision, empty rewards, updated test assertions 6284048
Fix syntax of [END] STDOUT line to perfectly match Hackathon mandatory format with score= parameter f96532b
Fix Phase 2 OpenEnv validation traps: add grader paths to openenv.yaml and safe parameterless defaults 699f953
Remove mock.py (old debug file with wrong [END] format) ddafe29
Fix: abort [END] lines use rewards=0.01 instead of empty rewards= to prevent evaluator 0.0 score 723407b
delete unnecessary files b7c48de
Varun commited on
Fix punctuation in project title 559f355
Varun commited on
Spec-compliance overhaul: remove difficulty_multiplier, weighted blend scoring, dep_hard fix, [END] format f3fd4ef
Skip benchmark store on fatal API errors (402/401/403) 1ecd7e1
Fix dep_hard Counter bug, add fatal error handling, update README with 14-model benchmark 3466d21
Major grading overhaul: difficulty multiplier, tighter scoring, mastery removal, precision penalties 72b3e8d
Fix state machine bugs and switch to average scoring for discriminative benchmarking cd5104a
Fix score aggregation: use max(rewards) for discriminative multi-turn scoring fe9aa5c
Remove rate limiter (blocks evaluator) and fix score aggregation to clamped sum 3dfb5fe
Clean README FILE 6938d9f
Merge remote-tracking branch 'origin/main' d939216
docs: clean up README for public hackathon submission (hide internal scoring formulas) cff7056
Update README.md 25d3831 unverified
kush-rc commited on
fix(docker): copy /usr/local/bin from builder so uvicorn is on PATH cfda61e
fix(benchmark): Hardening multi-agent environment and strict score compliance 6f95f2a
Clamp scores strictly to (0.01, 0.99) to pass OpenEnv Phase 2 continuous environment score verification checks 829f543
Fix UI score accumulation logic and save benchmark history fc84271
Revert incorrect log parsing changes and fix reward summation logic d270d2a
Fix log formatting to exactly match diagnostic feedback b4f20cf
Ensure /step returns info object perfectly matching OpenEnv spec 09576c0
Fix UI leaderboard fields and commit uv.lock bdb64b6
Fix benchmark output saving: add results dir and print errors 9bb611a
Fix HF_TOKEN parsing for strict validation 46acf43
Fix HF metadata 25c2f1c
Update README.md b8fe4b4 unverified
Varun commited on