Commit History

remove ambiguous moderation rows, replace with clear-cut examples
fcce834

avanigupta Claude Opus 4.6 (1M context) commited on

fix easy task test for updated issue types
0e13037

avanigupta Claude Opus 4.6 (1M context) commited on

replace ambiguous salary issue with date format fix
f1b7439

avanigupta Claude Opus 4.6 (1M context) commited on

fix root endpoint to list all 5 tasks
c3f32c9

avanigupta Claude Opus 4.6 (1M context) commited on

remove ambiguous LR fix — identify-only, any valid LR works
a1f98bf

avanigupta Claude Opus 4.6 (1M context) commited on

fix moderation issue row collisions and verify all data
8560706

avanigupta Claude Opus 4.6 (1M context) commited on

add moderation task to Gradio demo replay
887c1aa

avanigupta Claude Opus 4.6 (1M context) commited on

add content moderation task with real OpenAI Moderation data
b99e42b

avanigupta Claude Opus 4.6 (1M context) commited on

add toxic/biased response issue to alignment task
c699b6f

avanigupta Claude Opus 4.6 (1M context) commited on

replace ambiguous fixes with deterministic ones across all tasks
b08652c

avanigupta Claude Opus 4.6 (1M context) commited on

demo only proposes logically inferrable fixes
5de8f8e

avanigupta Claude Opus 4.6 (1M context) commited on

fix grading: reward valid fixes, not just exact matches
5e1f8bb

avanigupta Claude Opus 4.6 (1M context) commited on

update README with alignment task details and issue breakdown
1bd072d

avanigupta Claude Opus 4.6 (1M context) commited on

make alignment issues subtler to challenge frontier models
96d698c

avanigupta Claude Opus 4.6 (1M context) commited on

fix alignment demo trajectory to use correct clean values for fixes
8910a26

avanigupta Claude Opus 4.6 (1M context) commited on

use real NVIDIA HelpSteer data for alignment task
4051320

avanigupta Claude Opus 4.6 (1M context) commited on

improve alignment task: replace label swaps with real contamination
a9620ef

avanigupta Claude Opus 4.6 (1M context) commited on

use real Stanford Alpaca data for alignment task
7479de3

avanigupta Claude Opus 4.6 (1M context) commited on

add alignment data QA task: 12 issues in LLM instruction-tuning data
5cb467d

avanigupta Claude Opus 4.6 (1M context) commited on

Fix port to 8000 for validator compatibility
56f55e9

Varshith B Claude Opus 4.6 (1M context) commited on

Add root-level wrapper files and uv.lock for openenv deployment
0dbc19e

Varshith B Claude Opus 4.6 (1M context) commited on

Merge pull request #1 from varshith15/enhancementsv1
ca01572
unverified

Varshith Bathini commited on

remove base_path: /web to fix HF Space iframe 404
85257bc

avanigupta Claude Opus 4.6 (1M context) commited on

add root endpoint for browser/judge friendliness
51adf89

avanigupta Claude Opus 4.6 (1M context) commited on

remove binary PNGs for HF push compatibility
d7c51ad

avanigupta Claude Opus 4.6 (1M context) commited on

use port 7860 for HF Spaces compatibility
671acb9

avanigupta Claude Opus 4.6 (1M context) commited on

minor change for meeting requirement format
92187c5

avanigupta commited on

clean code structure
22369d8

avanigupta commited on

expand datasets to include harder real-world scenarios
5d90461

avanigupta commited on

expand datasets
081eb22

avanigupta commited on

add fix stage+demo
c3002ad

avanigupta commited on

fixes v1: add per step reward
cd11aba

avanigupta commited on

init
4c1a85d

Varshith B commited on