Spaces:

sh4shv4t
/

Parlay

Paused

App Files Files Community

Parlay / training /sft_train.py

Commit History

fix: add chat template to GRPO prompts

79d9923

sh4shv4t commited on 16 days ago

fix(sft): T4/Colab-friendly defaults (grad checkpoint, batch 2/accum 8) + CLI flags

f2cd270

sh4shv4t commited on 16 days ago

fix(sft): pick SFTConfig max_length vs max_seq_length by TRL version at runtime

1820b7c

sh4shv4t commited on 17 days ago

fix(sft): TRL 1.0+ uses max_length in SFTConfig, not max_seq_length

63e14b4

sh4shv4t commited on 17 days ago

Add pre-training audit scripts, OpenEnv manifest, and tune Parlay training/env (GRPO 1.5B default, min-reward filters, weighted data gen, hiring ZOPA+drift, veteran/opponent prompts, Docker/docs)

df724f2

sh4shv4t commited on 17 days ago

feat: streamline parlay for demo mode and add spectator negotiation mechanics

2568517

sh4shv4t commited on 17 days ago

feat: split Gemini 2.5 Flash (demo) and Flash-Lite (data), SFT threshold 0.3, favicon + check_gemini

9d82eed

sh4shv4t commited on 18 days ago

feat: project setup

698f4d8

sh4shv4t commited on 20 days ago