training/TRAINING.md: add "Quick start — just run the .sh" subsection 96d773b Don Rishabh Claude Opus 4.7 (1M context) commited on 12 days ago
training/TRAINING.md: add upfront "what the .sh launchers do" section e51b5ef Don Rishabh Claude Opus 4.7 (1M context) commited on 12 days ago
training/TRAINING.md: fix .sh / .py flag names so the recipe actually runs 8ac18d8 Don Rishabh Claude Opus 4.7 (1M context) commited on 12 days ago
Add training/TRAINING.md — end-to-end reproduction recipe 6206e8a Don Rishabh Claude Opus 4.7 (1M context) commited on 12 days ago
build_before_after_csv: --min-verbose-accuracy flag ea78734 Don Rishabh Claude Opus 4.7 (1M context) commited on 12 days ago
trackio: post-hoc replay of train_metrics.jsonl into a HF Space dashboard 3724e90 Don Rishabh Claude Opus 4.7 (1M context) commited on 12 days ago
demo CSVs: add reward_advantage_vs_verbose + accuracy_delta_vs_verbose 7dafc94 Don Rishabh Claude Opus 4.7 (1M context) commited on 12 days ago
demo: sample test input dropdown (per-task examples in CSV) bdd9948 Don Rishabh commited on 12 days ago
multistep: gradient checkpointing + tighter memory defaults 7ca042f Don Rishabh Claude Opus 4.7 (1M context) commited on 12 days ago
tasks_policy: long-context policy-compression tasks e8ef5c3 Don Rishabh Claude Opus 4.7 (1M context) commited on 13 days ago
hf_job_train: add ENABLE_THINKING env var (default true) 20f81cc Don Rishabh Claude Opus 4.7 (1M context) commited on 13 days ago
train_grpo: drop stale args.turn_limit reference at build_prompt_dataset call site 0e3893d Don Rishabh Claude Opus 4.7 (1M context) commited on 13 days ago
v3: multi-turn env, thinking tokens, cross-family Qwen->Llama, multi-step GRPO 67509ac Don Rishabh Claude Opus 4.7 (1M context) commited on 13 days ago
profile_baseline: fix wrong TargetGeneration field accesses 3a1b533 Don Rishabh Claude Opus 4.7 (1M context) commited on 13 days ago
profile: install hf_transfer (HF_HUB_ENABLE_HF_TRANSFER was set without the package) 8526703 Don Rishabh Claude Opus 4.7 (1M context) commited on 13 days ago
profile_baseline: pass required max_output_tokens to generate_batch 581249b Don Rishabh Claude Opus 4.7 (1M context) commited on 13 days ago
tasks_tough: add 42 more tough scenarios + baseline profiler fe54c01 Don Rishabh Claude Opus 4.7 (1M context) commited on 13 days ago
tasks_tough: add 10 domain-classifier tough scenarios (seed batch) 25d9413 Don Rishabh Claude Opus 4.7 (1M context) commited on 13 days ago
eval: use merged v1+v2 task bank (same fix train_grpo.py already had) 450384e Don Rishabh commited on 14 days ago
Revert num_generations 10 -> 8 (must divide generation_batch_size=8) 1c3ea4f Don Rishabh commited on 14 days ago
Pre-launch fixes: disable Qwen3 thinking, strip think blocks, degenerate-short guard 5abc867 Don Rishabh Claude Opus 4.7 (1M context) commited on 14 days ago
GRPO: explicit temperature=0.9 top_p=1.0 (override Qwen3 defaults of 0.6/0.95 for rollout diversity) 1d31f17 Don Rishabh commited on 14 days ago
Fall back from Qwen3.5 -> Qwen3 family (transformers==4.56.2 compat) ade2f03 Don Rishabh Claude Opus 4.7 (1M context) commited on 14 days ago
Fix v2 smoke failures: load merged task bank + newer transformers for Qwen3.5 070be2b Don Rishabh Claude Opus 4.7 (1M context) commited on 14 days ago
v2 stack: Qwen3.5-2B agent/target, Qwen3.5-9B judge, hard tasks, additive reward 3889513 Don Rishabh Claude Opus 4.7 (1M context) commited on 14 days ago
eval: --push-to-hub uploads eval JSONL to adapter repo under evals/ 309fb46 Don Rishabh commited on 14 days ago
Revert agent loading to TRL + PEFT (Unsloth collides with frozen target) 1da121e Don Rishabh Claude Opus 4.7 (1M context) commited on 14 days ago
Drop duplicate make_plots call; train_grpo.py now renders + uploads plots inline e424cfe Don Rishabh commited on 14 days ago
Persist training artifacts: upload metrics + plots alongside adapter 156145e Don Rishabh Claude Opus 4.7 (1M context) commited on 14 days ago
Switch agent loading to Unsloth FastLanguageModel + fix padding side 02851f3 Don Rishabh Claude Opus 4.7 (1M context) commited on 14 days ago
Install openenv-core explicitly (our env imports it; --no-deps skipped it) cef1a55 Don Rishabh commited on 14 days ago
Adopt OpenEnv-official install pattern (unsloth_2048.ipynb) 80f9ea6 Don Rishabh Claude Opus 4.7 (1M context) commited on 14 days ago
Switch base image to python:3.12-slim + explicit torch 2.7 install f45a05a Don Rishabh Claude Opus 4.7 (1M context) commited on 14 days ago
Align HF Jobs deps with spaces_pipeline_env Colab stack cc812a5 Don Rishabh Claude Opus 4.7 (1M context) commited on 14 days ago
Bump HF Jobs base image to pytorch 2.5.1 (trl>=0.14 needs FSDPModule) 273aa5a Don Rishabh commited on 14 days ago
Fix bash -c invocation: add -- separator to stop hf CLI from eating -l -c as short flags fa930df Don Rishabh commited on 15 days ago
Fix hf jobs CLI: use --secrets plural + --detach, optional --push-to-hub aaa0f2f Don Rishabh Claude Opus 4.7 (1M context) commited on 15 days ago
Initial commit: Prompt Golf environment for OpenEnv 6850dad Don Rishabh Claude Opus 4.7 (1M context) commited on 15 days ago