sync: docs, training page fixes, OpenEnv SFT demo notebook 4904ccb verified sh4shv4t commited on 12 days ago
sync: docs, training page fixes, OpenEnv SFT demo notebook 50e78ff verified sh4shv4t commited on 12 days ago
sync: docs, training page fixes, OpenEnv SFT demo notebook 8a0b968 verified sh4shv4t commited on 12 days ago
sync: docs, training page fixes, OpenEnv SFT demo notebook 9ef99b8 verified sh4shv4t commited on 12 days ago
sync: docs, training page fixes, OpenEnv SFT demo notebook 90fedec verified sh4shv4t commited on 12 days ago
sync: docs, training page fixes, OpenEnv SFT demo notebook 4d96605 verified sh4shv4t commited on 12 days ago
sync: docs, training page fixes, OpenEnv SFT demo notebook d440298 verified sh4shv4t commited on 12 days ago
Relocate training notebooks, add BLOG and Google Colab links (SFT + GRPO HF Job), dashboard updates, and eval artifacts 00a2188 sh4shv4t commited on 12 days ago
fix: increased max completion length to reduce model output truncation 8679498 sh4shv4t commited on 12 days ago
Add GRPO HF job reward/loss curves, dashboard wiring, plot script, and fix grpo_train log_history unwrap bf9f882 sh4shv4t commited on 12 days ago
fix: add pydantic/numpy/fastapi to requirements-train.txt for HF Jobs caa9c4f sh4shv4t commited on 12 days ago
Add OpenEnv client, compat layer, manifest, scripts, GRPO plot hook, and README 81b4b70 sh4shv4t commited on 12 days ago
docs: fix Nash BATNA formula in parlay_hf_article for GitHub Markdown (HTML, no LaTeX) 23036c1 sh4shv4t commited on 12 days ago
fix: reward_fn: robust parse + format gradient; colab T4 GRPO defaults. 012ae6d sh4shv4t commited on 12 days ago
fix(grpo): normalize dataset kwargs in reward_fn (TRL may pass 1-elem lists) fa5ff62 sh4shv4t commited on 12 days ago
fix(grpo): load SFT as base+PEFT so adapter dirs work (no top-level model_type) d97e357 sh4shv4t commited on 12 days ago
fix(grpo): put reward_weights on GRPOConfig when TRL no longer accepts it on GRPOTrainer 67bde42 sh4shv4t commited on 12 days ago
fix(grpo): pass num_generations into train_grpo and pin generation_batch_size for TRL G divisibility b497689 sh4shv4t commited on 12 days ago
fix: changes to grpo_train pipeline to fix divisibility of per_device_train_batch_size and gradient_accumulation_steps cadee25 sh4shv4t commited on 12 days ago
fix(sft): T4/Colab-friendly defaults (grad checkpoint, batch 2/accum 8) + CLI flags f2cd270 sh4shv4t commited on 12 days ago
fix(sft): pick SFTConfig max_length vs max_seq_length by TRL version at runtime 1820b7c sh4shv4t commited on 12 days ago
fix(sft): TRL 1.0+ uses max_length in SFTConfig, not max_seq_length 63e14b4 sh4shv4t commited on 12 days ago
Add pre-training audit scripts, OpenEnv manifest, and tune Parlay training/env (GRPO 1.5B default, min-reward filters, weighted data gen, hiring ZOPA+drift, veteran/opponent prompts, Docker/docs) df724f2 sh4shv4t commited on 13 days ago
feat: flash-lite for data-gen and flash for UI; remove training page; card tests; --quiet data gen; data/ inspect path; random baseline; GRPO env wrapper; reward fixes (buyer ZOPA, ToM signals); drift + Brier metrics; Bayesian ToM module 15976d0 sh4shv4t commited on 13 days ago
fix: move global declarations before first use (grpo_train, call_gemini) 8ec5193 sh4shv4t commited on 13 days ago
feat: backup existing data + per-episode progress tracking + gemini live-call verification 48756ef sh4shv4t commited on 13 days ago
fix: normalise reward terms for acquisition_term_sheet scale mismatch 5c7939a sh4shv4t commited on 13 days ago
feat: backup pre-2.5 data + add --inspect flag for quality diagnostic run 7ad35af sh4shv4t commited on 13 days ago
fix: upgrade gemini model string to 2.5-flash-lite + add tom diagnostic script 3f61551 sh4shv4t commited on 13 days ago
feat: streamline parlay for demo mode and add spectator negotiation mechanics 2568517 sh4shv4t commited on 13 days ago
feat: split Gemini 2.5 Flash (demo) and Flash-Lite (data), SFT threshold 0.3, favicon + check_gemini 9d82eed sh4shv4t commited on 14 days ago