final-iteration / training

Commit History

train: shrink to weekly horizon + bounded steps
abe4587

vaibhav12332112312 commited on

train: default HF Job flavor l4x1 -> l40sx1 (48GB VRAM)
76b19bd

vaibhav12332112312 commited on

train: per-step credit + drop replies + larger batches
9ee7a09

vaibhav12332112312 commited on

fix(notebook): py3.11 f-string backslash error in format_obs
56f70b1

vaibhav12332112312 commited on

fix(notebook): pin typing_extensions>=4.13.0 to fix pydantic Sentinel ImportError
b1bd9cc

vaibhav12332112312 commited on

fix: restore parse_model_output exception parity with original bare except
aeedd8d

anuragredbus commited on

chore: align train_grpo.ipynb with smoke/syntax patterns for Colab
0587f05

anuragredbus commited on

add training/syntax_only.ipynb — kernel + Python syntax only (no project logic)
0e50d91

anuragredbus commited on

add train_grpo_smoke notebook; quote pip versions in train_grpo
b55c1ff

anuragredbus commited on

fix: notebook loads Qwen without bitsandbytes on Mac; optional training deps
eb1d764

anuragredbus commited on

fix: robust notebook setup (no magic shell) + local CWD auto-detect
8d09986

anuragredbus commited on

fix: rewrite training notebook for real LoRA fine-tuning on Colab
4a29e22

anuragredbus commited on