fix(sft): T4/Colab-friendly defaults (grad checkpoint, batch 2/accum 8) + CLI flags f2cd270 sh4shv4t commited on 16 days ago
fix(sft): pick SFTConfig max_length vs max_seq_length by TRL version at runtime 1820b7c sh4shv4t commited on 17 days ago
fix(sft): TRL 1.0+ uses max_length in SFTConfig, not max_seq_length 63e14b4 sh4shv4t commited on 17 days ago
Add pre-training audit scripts, OpenEnv manifest, and tune Parlay training/env (GRPO 1.5B default, min-reward filters, weighted data gen, hiring ZOPA+drift, veteran/opponent prompts, Docker/docs) df724f2 sh4shv4t commited on 17 days ago
feat: streamline parlay for demo mode and add spectator negotiation mechanics 2568517 sh4shv4t commited on 17 days ago
feat: split Gemini 2.5 Flash (demo) and Flash-Lite (data), SFT threshold 0.3, favicon + check_gemini 9d82eed sh4shv4t commited on 18 days ago