chore(spaces): add HF Space config for training [skip ci] bd95811 Prasham.Jain commited on 12 days ago
fix(sft): move max_seq_length + dataset_text_field to SFTTrainer 0ced196 Prasham.Jain Claude Sonnet 4.6 commited on 12 days ago
fix(training): follow unsloth's Qwen3 guide exactly c49a155 Prasham.Jain Claude Sonnet 4.6 commited on 12 days ago
fix(training): upgrade base image to torch 2.6.0+cu126 15e36fe Prasham.Jain Claude Sonnet 4.6 commited on 12 days ago
feat(training): switch from LoRA to QLoRA per mentor recommendation 8580936 Prasham.Jain Claude Sonnet 4.6 commited on 12 days ago
fix(training): pin torchao==0.5.0 after unsloth to fix torch.int1 error 4647df7 Prasham.Jain Claude Sonnet 4.6 commited on 12 days ago
fix(spaces): switch training Space to A10G Small, tune notebook for 24 GB 421885d Prasham.Jain Claude Sonnet 4.6 commited on 12 days ago
fix(spaces): add app_port: 8000 to env Space YAML b78f85d Prasham.Jain Claude Sonnet 4.6 commited on 12 days ago
fix(training): upgrade to torch 2.5.1+cu124, restore unsloth for Qwen3 ddfe351 Prasham.Jain Claude Sonnet 4.6 commited on 12 days ago
fix(training): drop unsloth, use bitsandbytes+PEFT for SFT 68277e2 Prasham.Jain Claude Sonnet 4.6 commited on 12 days ago
fix(docker): pin torchao==0.5.0 + transformers/trl/peft before unsloth install e3da0da Prasham.Jain commited on 12 days ago
fix(notebook): Cell 3 load_dataset per split + write JSON (was snapshot_download finding 0 .json files) 127ef78 Prasham.Jain commited on 12 days ago
fix(docker): COPY README.md into build context (hatchling metadata requires it) 54e5bb5 Prasham.Jain commited on 12 days ago
fix(spaces): orangeβyellow (not a valid HF colorFrom value) f81c2bd Prasham.Jain commited on 12 days ago
fix(spaces): inject HF Space YAML frontmatter into README on push 23c20a6 Prasham.Jain Claude Sonnet 4.6 commited on 12 days ago
chore: add push_to_hf.sh β one-shot deploy to both HF Spaces 042c15f Prasham.Jain Claude Sonnet 4.6 commited on 12 days ago
feat(training): A10G-optimised pipeline β auto train.py, Dockerfile.train, GH Action sync 11f97d8 Prasham.Jain Claude Sonnet 4.6 commited on 12 days ago
perf(trajectory_gen): parallel workers + JSONL checkpoint for resume ef5ead6 Prasham.Jain Claude Sonnet 4.6 commited on 12 days ago
fix(trajectory_gen): add --scenarios-dir flag for server-free generation 9fa7302 Prasham.Jain Claude Sonnet 4.6 commited on 12 days ago
fix(submission): Dockerfile, wire-format fixes, scenario loading, real-scenario MockEnvClient ba93ec0 Prasham.Jain Claude Sonnet 4.6 commited on 13 days ago
merge(branch-b): data pipeline + rewards + training β B1-B5, C1-C6 5305230 Prasham.Jain commited on 13 days ago
merge(branch-a): env core β FastAPI server, 11 tool routes, episode lifecycle, replay visualizer (A1βA5) dec324b Prasham.Jain commited on 13 days ago
feat(training): Phase C6 β ablations, training curves, readme finalization e46f00b Prasham.Jain Claude Sonnet 4.6 commited on 13 days ago
feat(training): Phase C5 β evaluation harness, baselines, plots, readme table 93e68bc Prasham.Jain Claude Sonnet 4.6 commited on 13 days ago
feat(training): Phase C4 β GRPO training, SFT warmstart, rollout, custom trainer 3e29c8b Prasham.Jain Claude Sonnet 4.6 commited on 13 days ago
feat(training): Phase C3 β SFT trajectory generator, env clients, mock env 5ae5581 Prasham.Jain Claude Sonnet 4.6 commited on 13 days ago
feat(rewards): Phase C2 β composite reward, replay verifier, frozen weights 3553258 Prasham.Jain Claude Sonnet 4.6 commited on 13 days ago
feat(rewards): Phase C1 β all 9 reward components implemented d11066d Prasham.Jain Claude Sonnet 4.6 commited on 13 days ago
feat(data): Phase B5 β corpus instantiation, HF publish, annotations 18a3fbf Prasham.Jain Claude Sonnet 4.6 commited on 13 days ago
feat(data): Phase B4 β 7 ScenarioFamilyGenerators with archetype loading 7a658b7 Prasham.Jain Claude Opus 4.7 (1M context) commited on 13 days ago
feat(data): Phase B3 β failure clustering and archetype extraction cd61817 Prasham.Jain Claude Opus 4.7 (1M context) commited on 13 days ago
feat(branch-b): B2 GitHub Actions log mining β gh-CLI scraper, anonymizer, throttle, cache a4ff035 Prasham.Jain Claude Opus 4.7 (1M context) commited on 13 days ago
feat(branch-b): B1 public dataset ingest β DeFlaker, iDFlakies, FlakeFlagger, LogHub loaders + CLI 54627d8 Prasham.Jain Claude Opus 4.7 (1M context) commited on 13 days ago
feat(branch-a): A5 replay visualizer β static HTML/JS viewer mounted under /viz e1814ef Prasham.Jain Claude Opus 4.7 (1M context) commited on 13 days ago
feat(branch-a): A3 episode lifecycle β budget enforcement, terminal handling, payload truncation, trace serialization ed51f28 Prasham.Jain Claude Opus 4.7 (1M context) commited on 13 days ago
feat(branch-a): A2 tool implementations β route 11 handlers to scenario.tool_outputs with cost charging 272a052 Prasham.Jain Claude Opus 4.7 (1M context) commited on 13 days ago
refactor(branch-a): A1 β adopt openenv-core MCPEnvironment + create_app for canonical OpenEnv compliance 4c1ad51 Prasham.Jain Claude Opus 4.7 (1M context) commited on 13 days ago
feat(branch-a): A1 server scaffold β FastAPI /reset /step /state /mcp + 11 stub tool handlers 8be6018 Prasham.Jain Claude Opus 4.7 (1M context) commited on 13 days ago
feat(phase-0): foundation β uv project, schemas, mocks, manifest, CI 19e2683 Prasham.Jain Claude Opus 4.7 (1M context) commited on 13 days ago