fix(sft): move max_seq_length + dataset_text_field to SFTTrainer 0ced196 Prasham.Jain Claude Sonnet 4.6 commited on 14 days ago
fix(training): follow unsloth's Qwen3 guide exactly c49a155 Prasham.Jain Claude Sonnet 4.6 commited on 14 days ago
feat(training): switch from LoRA to QLoRA per mentor recommendation 8580936 Prasham.Jain Claude Sonnet 4.6 commited on 14 days ago
fix(training): upgrade to torch 2.5.1+cu124, restore unsloth for Qwen3 ddfe351 Prasham.Jain Claude Sonnet 4.6 commited on 14 days ago
fix(training): drop unsloth, use bitsandbytes+PEFT for SFT 68277e2 Prasham.Jain Claude Sonnet 4.6 commited on 14 days ago
feat(training): A10G-optimised pipeline β auto train.py, Dockerfile.train, GH Action sync 11f97d8 Prasham.Jain Claude Sonnet 4.6 commited on 14 days ago
perf(trajectory_gen): parallel workers + JSONL checkpoint for resume ef5ead6 Prasham.Jain Claude Sonnet 4.6 commited on 14 days ago
fix(trajectory_gen): add --scenarios-dir flag for server-free generation 9fa7302 Prasham.Jain Claude Sonnet 4.6 commited on 14 days ago
fix(submission): Dockerfile, wire-format fixes, scenario loading, real-scenario MockEnvClient ba93ec0 Prasham.Jain Claude Sonnet 4.6 commited on 14 days ago
feat(training): Phase C6 β ablations, training curves, readme finalization e46f00b Prasham.Jain Claude Sonnet 4.6 commited on 14 days ago
feat(training): Phase C5 β evaluation harness, baselines, plots, readme table 93e68bc Prasham.Jain Claude Sonnet 4.6 commited on 14 days ago
feat(training): Phase C4 β GRPO training, SFT warmstart, rollout, custom trainer 3e29c8b Prasham.Jain Claude Sonnet 4.6 commited on 14 days ago
feat(training): Phase C3 β SFT trajectory generator, env clients, mock env 5ae5581 Prasham.Jain Claude Sonnet 4.6 commited on 14 days ago
feat(phase-0): foundation β uv project, schemas, mocks, manifest, CI 19e2683 Prasham.Jain Claude Opus 4.7 (1M context) commited on 14 days ago