Parlay / training

Commit History

fix: replace all 7B references with 1.5B
8111291
verified

sh4shv4t commited on

sync: docs, training page fixes, OpenEnv SFT demo notebook
4904ccb
verified

sh4shv4t commited on

sync: docs, training page fixes, OpenEnv SFT demo notebook
50e78ff
verified

sh4shv4t commited on

Relocate training notebooks, add BLOG and Google Colab links (SFT + GRPO HF Job), dashboard updates, and eval artifacts
00a2188

sh4shv4t commited on

fix(reward func): reward func was converting data to a lit
70be177

sh4shv4t commited on

fix: add chat template to GRPO prompts
79d9923

sh4shv4t commited on

fix: increased max completion length to reduce model output truncation
8679498

sh4shv4t commited on

Add GRPO HF job reward/loss curves, dashboard wiring, plot script, and fix grpo_train log_history unwrap
bf9f882

sh4shv4t commited on

feat: added images, new sft notebook, jobs to do grpo
213dee8

sh4shv4t commited on

feat: training results page + SFT Colab notebook
108bc34

sh4shv4t commited on

Add OpenEnv client, compat layer, manifest, scripts, GRPO plot hook, and README
81b4b70

sh4shv4t commited on

fix: reward_fn: robust parse + format gradient; colab T4 GRPO defaults.
012ae6d

sh4shv4t commited on

fix(grpo): normalize dataset kwargs in reward_fn (TRL may pass 1-elem lists)
fa5ff62

sh4shv4t commited on

fix(grpo): load SFT as base+PEFT so adapter dirs work (no top-level model_type)
d97e357

sh4shv4t commited on

fix(grpo): put reward_weights on GRPOConfig when TRL no longer accepts it on GRPOTrainer
67bde42

sh4shv4t commited on

fix(grpo): pass num_generations into train_grpo and pin generation_batch_size for TRL G divisibility
b497689

sh4shv4t commited on

fix: changes to grpo_train pipeline to fix divisibility of per_device_train_batch_size and gradient_accumulation_steps
cadee25

sh4shv4t commited on

fix(sft): T4/Colab-friendly defaults (grad checkpoint, batch 2/accum 8) + CLI flags
f2cd270

sh4shv4t commited on

fix(sft): pick SFTConfig max_length vs max_seq_length by TRL version at runtime
1820b7c

sh4shv4t commited on

fix(sft): TRL 1.0+ uses max_length in SFTConfig, not max_seq_length
63e14b4

sh4shv4t commited on

Add pre-training audit scripts, OpenEnv manifest, and tune Parlay training/env (GRPO 1.5B default, min-reward filters, weighted data gen, hiring ZOPA+drift, veteran/opponent prompts, Docker/docs)
df724f2

sh4shv4t commited on

feat: flash-lite for data-gen and flash for UI; remove training page; card tests; --quiet data gen; data/ inspect path; random baseline; GRPO env wrapper; reward fixes (buyer ZOPA, ToM signals); drift + Brier metrics; Bayesian ToM module
15976d0

sh4shv4t commited on

fix: trainer notebook improvements
f3d2cd4

sh4shv4t commited on

fix: move global declarations before first use (grpo_train, call_gemini)
8ec5193

sh4shv4t commited on

feat: backup existing data + per-episode progress tracking + gemini live-call verification
48756ef

sh4shv4t commited on

fix: normalise reward terms for acquisition_term_sheet scale mismatch
5c7939a

sh4shv4t commited on

fix: fixed sys.path issues on running generate_data.py
3791108

sh4shv4t commited on

feat: backup pre-2.5 data + add --inspect flag for quality diagnostic run
7ad35af

sh4shv4t commited on

fix: gemini retry backoff + tom belief diagnostic logging
80b3b2e

sh4shv4t commited on

fix: resolve WebSocket HTTP 403 on OpenEnv env server
f33ad7b

sh4shv4t commited on

feat: streamline parlay for demo mode and add spectator negotiation mechanics
2568517

sh4shv4t commited on

feat: split Gemini 2.5 Flash (demo) and Flash-Lite (data), SFT threshold 0.3, favicon + check_gemini
9d82eed

sh4shv4t commited on

fix: fixed UI bugs, keyless testing initiated
14577ec

sh4shv4t commited on

refactor(agent): migrate Gemini client from google-generativeai to google-genai
dd46a0d

sh4shv4t commited on

feat: project setup
698f4d8

sh4shv4t commited on