fix: align notebook with 15-day horizon, drop unused replies field f7b5241 vaibhav12332112312 commited on 13 days ago
train: batched parallel rollouts on Qwen2.5-3B + parser hardening a6b8df0 vaibhav12332112312 commited on 13 days ago
Default repo clone branch to main for training notebooks and HF script. ad48770 anuragredbus commited on 13 days ago
Set TASK_HORIZON to 15 days and align graders, UI, and training prompts. 99717c2 anuragredbus commited on 13 days ago
train: default HF Job flavor l4x1 -> l40sx1 (48GB VRAM) 76b19bd vaibhav12332112312 commited on 13 days ago
train: per-step credit + drop replies + larger batches 9ee7a09 vaibhav12332112312 commited on 13 days ago
fix(notebook): py3.11 f-string backslash error in format_obs 56f70b1 vaibhav12332112312 commited on 13 days ago
fix(notebook): pin typing_extensions>=4.13.0 to fix pydantic Sentinel ImportError b1bd9cc vaibhav12332112312 commited on 13 days ago
fix: restore parse_model_output exception parity with original bare except aeedd8d anuragredbus commited on 13 days ago
chore: align train_grpo.ipynb with smoke/syntax patterns for Colab 0587f05 anuragredbus commited on 13 days ago
add training/syntax_only.ipynb — kernel + Python syntax only (no project logic) 0e50d91 anuragredbus commited on 13 days ago
add train_grpo_smoke notebook; quote pip versions in train_grpo b55c1ff anuragredbus commited on 13 days ago
fix: notebook loads Qwen without bitsandbytes on Mac; optional training deps eb1d764 anuragredbus commited on 13 days ago
fix: robust notebook setup (no magic shell) + local CWD auto-detect 8d09986 anuragredbus commited on 13 days ago
fix: rewrite training notebook for real LoRA fine-tuning on Colab 4a29e22 anuragredbus commited on 13 days ago