sync: docs, training page fixes, OpenEnv SFT demo notebook 4904ccb verified sh4shv4t commited on 12 days ago
sync: docs, training page fixes, OpenEnv SFT demo notebook 50e78ff verified sh4shv4t commited on 12 days ago
Relocate training notebooks, add BLOG and Google Colab links (SFT + GRPO HF Job), dashboard updates, and eval artifacts 00a2188 sh4shv4t commited on 12 days ago
fix: increased max completion length to reduce model output truncation 8679498 sh4shv4t commited on 12 days ago
Add GRPO HF job reward/loss curves, dashboard wiring, plot script, and fix grpo_train log_history unwrap bf9f882 sh4shv4t commited on 12 days ago
Add OpenEnv client, compat layer, manifest, scripts, GRPO plot hook, and README 81b4b70 sh4shv4t commited on 12 days ago
fix: reward_fn: robust parse + format gradient; colab T4 GRPO defaults. 012ae6d sh4shv4t commited on 13 days ago
fix(grpo): normalize dataset kwargs in reward_fn (TRL may pass 1-elem lists) fa5ff62 sh4shv4t commited on 13 days ago
fix(grpo): load SFT as base+PEFT so adapter dirs work (no top-level model_type) d97e357 sh4shv4t commited on 13 days ago
fix(grpo): put reward_weights on GRPOConfig when TRL no longer accepts it on GRPOTrainer 67bde42 sh4shv4t commited on 13 days ago
fix(grpo): pass num_generations into train_grpo and pin generation_batch_size for TRL G divisibility b497689 sh4shv4t commited on 13 days ago
fix: changes to grpo_train pipeline to fix divisibility of per_device_train_batch_size and gradient_accumulation_steps cadee25 sh4shv4t commited on 13 days ago
fix(sft): T4/Colab-friendly defaults (grad checkpoint, batch 2/accum 8) + CLI flags f2cd270 sh4shv4t commited on 13 days ago
fix(sft): pick SFTConfig max_length vs max_seq_length by TRL version at runtime 1820b7c sh4shv4t commited on 13 days ago
fix(sft): TRL 1.0+ uses max_length in SFTConfig, not max_seq_length 63e14b4 sh4shv4t commited on 13 days ago
Add pre-training audit scripts, OpenEnv manifest, and tune Parlay training/env (GRPO 1.5B default, min-reward filters, weighted data gen, hiring ZOPA+drift, veteran/opponent prompts, Docker/docs) df724f2 sh4shv4t commited on 13 days ago
feat: flash-lite for data-gen and flash for UI; remove training page; card tests; --quiet data gen; data/ inspect path; random baseline; GRPO env wrapper; reward fixes (buyer ZOPA, ToM signals); drift + Brier metrics; Bayesian ToM module 15976d0 sh4shv4t commited on 13 days ago
fix: move global declarations before first use (grpo_train, call_gemini) 8ec5193 sh4shv4t commited on 13 days ago
feat: backup existing data + per-episode progress tracking + gemini live-call verification 48756ef sh4shv4t commited on 13 days ago
fix: normalise reward terms for acquisition_term_sheet scale mismatch 5c7939a sh4shv4t commited on 13 days ago
feat: backup pre-2.5 data + add --inspect flag for quality diagnostic run 7ad35af sh4shv4t commited on 13 days ago
feat: streamline parlay for demo mode and add spectator negotiation mechanics 2568517 sh4shv4t commited on 13 days ago
feat: split Gemini 2.5 Flash (demo) and Flash-Lite (data), SFT threshold 0.3, favicon + check_gemini 9d82eed sh4shv4t commited on 14 days ago
refactor(agent): migrate Gemini client from google-generativeai to google-genai dd46a0d sh4shv4t commited on 16 days ago