Commit History

chore: Apply Bug #2 and Bug #3 strict min/max bound clamping to prevent out of range scores and fix windows encoding
ee547a6
Running

immortalindeed commited on

Sync Web UI [END] logs to use the exact formal score= specification format
f63920a

immortalindeed commited on

Final strict spec-compliance polish: score precision, empty rewards, updated test assertions
6284048

immortalindeed commited on

Fix syntax of [END] STDOUT line to perfectly match Hackathon mandatory format with score= parameter
f96532b

immortalindeed commited on

Fix Phase 2 OpenEnv validation traps: add grader paths to openenv.yaml and safe parameterless defaults
699f953

immortalindeed commited on

Remove mock.py (old debug file with wrong [END] format)
ddafe29

immortalindeed commited on

Fix: abort [END] lines use rewards=0.01 instead of empty rewards= to prevent evaluator 0.0 score
723407b

immortalindeed commited on

delete unnecessary files
b7c48de

Varun commited on

Fix punctuation in project title
559f355

Varun commited on

Spec-compliance overhaul: remove difficulty_multiplier, weighted blend scoring, dep_hard fix, [END] format
f3fd4ef

immortalindeed commited on

Skip benchmark store on fatal API errors (402/401/403)
1ecd7e1

immortalindeed commited on

Fix dep_hard Counter bug, add fatal error handling, update README with 14-model benchmark
3466d21

immortalindeed commited on

Major grading overhaul: difficulty multiplier, tighter scoring, mastery removal, precision penalties
72b3e8d

immortalindeed commited on

Fix state machine bugs and switch to average scoring for discriminative benchmarking
cd5104a

immortalindeed commited on

Fix score aggregation: use max(rewards) for discriminative multi-turn scoring
fe9aa5c

immortalindeed commited on

Remove rate limiter (blocks evaluator) and fix score aggregation to clamped sum
3dfb5fe

immortalindeed commited on

Merge remote-tracking branch 'origin/main'
d939216

immortalindeed commited on

docs: clean up README for public hackathon submission (hide internal scoring formulas)
cff7056

immortalindeed commited on

Update README.md
25d3831
unverified

kush-rc commited on

fix(docker): copy /usr/local/bin from builder so uvicorn is on PATH
cfda61e

immortalindeed commited on

fix(benchmark): Hardening multi-agent environment and strict score compliance
6f95f2a

immortalindeed commited on

Clamp scores strictly to (0.01, 0.99) to pass OpenEnv Phase 2 continuous environment score verification checks
829f543

immortalindeed commited on

Fix UI score accumulation logic and save benchmark history
fc84271

immortalindeed commited on

Revert incorrect log parsing changes and fix reward summation logic
d270d2a

immortalindeed commited on

Fix log formatting to exactly match diagnostic feedback
b4f20cf

immortalindeed commited on

Ensure /step returns info object perfectly matching OpenEnv spec
09576c0

immortalindeed commited on

Fix UI leaderboard fields and commit uv.lock
bdb64b6

immortalindeed commited on

Fix benchmark output saving: add results dir and print errors
9bb611a

immortalindeed commited on

Fix HF_TOKEN parsing for strict validation
46acf43

immortalindeed commited on

Update README.md
b8fe4b4
unverified

Varun commited on