Spaces:

sh4shv4t
/

Parlay

Paused

App Files Files Community

Parlay

Commit History

fix: replace all 7B references with 1.5B

8111291
verified

sh4shv4t commited on 12 days ago

fix: replace all 7B references with 1.5B

ab8ac88
verified

sh4shv4t commited on 12 days ago

sync: docs, training page fixes, OpenEnv SFT demo notebook

4904ccb
verified

sh4shv4t commited on 12 days ago

sync: docs, training page fixes, OpenEnv SFT demo notebook

50e78ff
verified

sh4shv4t commited on 12 days ago

sync: docs, training page fixes, OpenEnv SFT demo notebook

8a0b968
verified

sh4shv4t commited on 12 days ago

sync: docs, training page fixes, OpenEnv SFT demo notebook

9ef99b8
verified

sh4shv4t commited on 12 days ago

sync: docs, training page fixes, OpenEnv SFT demo notebook

90fedec
verified

sh4shv4t commited on 12 days ago

sync: docs, training page fixes, OpenEnv SFT demo notebook

4d96605
verified

sh4shv4t commited on 12 days ago

sync: docs, training page fixes, OpenEnv SFT demo notebook

d440298
verified

sh4shv4t commited on 12 days ago

Relocate training notebooks, add BLOG and Google Colab links (SFT + GRPO HF Job), dashboard updates, and eval artifacts

00a2188

sh4shv4t commited on 12 days ago

docs: formatting fixes to README.md

1b1c2d9

sh4shv4t commited on 12 days ago

feat: made most image results visible

0041fa2

sh4shv4t commited on 12 days ago

feat: added reward audit program

0faca0b

sh4shv4t commited on 12 days ago

fix(reward func): reward func was converting data to a lit

70be177

sh4shv4t commited on 12 days ago

fix: add chat template to GRPO prompts

79d9923

sh4shv4t commited on 12 days ago

fix: increased max completion length to reduce model output truncation

8679498

sh4shv4t commited on 12 days ago

Add GRPO HF job reward/loss curves, dashboard wiring, plot script, and fix grpo_train log_history unwrap

bf9f882

sh4shv4t commited on 12 days ago

fix: add pydantic/numpy/fastapi to requirements-train.txt for HF Jobs

caa9c4f

sh4shv4t commited on 12 days ago

feat: added images, new sft notebook, jobs to do grpo

213dee8

sh4shv4t commited on 12 days ago

feat: created script to push datasets to huggingface

64b4e71

sh4shv4t commited on 12 days ago

feat: training results page + SFT Colab notebook

108bc34

sh4shv4t commited on 12 days ago

fix: use valid HF emoji in README YAML front matter (◈ → 🤝)

f2152d2

sh4shv4t commited on 12 days ago

Add OpenEnv client, compat layer, manifest, scripts, GRPO plot hook, and README

81b4b70

sh4shv4t commited on 12 days ago

docs: fix Nash BATNA formula in parlay_hf_article for GitHub Markdown (HTML, no LaTeX)

23036c1

sh4shv4t commited on 12 days ago

fix: reward_fn: robust parse + format gradient; colab T4 GRPO defaults.

012ae6d

sh4shv4t commited on 12 days ago

fix(grpo): normalize dataset kwargs in reward_fn (TRL may pass 1-elem lists)

fa5ff62

sh4shv4t commited on 12 days ago

fix(grpo): load SFT as base+PEFT so adapter dirs work (no top-level model_type)

d97e357

sh4shv4t commited on 12 days ago

fix(grpo): put reward_weights on GRPOConfig when TRL no longer accepts it on GRPOTrainer

67bde42

sh4shv4t commited on 12 days ago

fix(grpo): pass num_generations into train_grpo and pin generation_batch_size for TRL G divisibility

b497689

sh4shv4t commited on 12 days ago

fix: changes to grpo_train pipeline to fix divisibility of per_device_train_batch_size and gradient_accumulation_steps

cadee25

sh4shv4t commited on 12 days ago

fix(sft): T4/Colab-friendly defaults (grad checkpoint, batch 2/accum 8) + CLI flags

f2cd270

sh4shv4t commited on 12 days ago

fix(sft): pick SFTConfig max_length vs max_seq_length by TRL version at runtime

1820b7c

sh4shv4t commited on 12 days ago

docs: minor corrections in the hf article

f1c9f55

sh4shv4t commited on 12 days ago

fix(sft): TRL 1.0+ uses max_length in SFTConfig, not max_seq_length

63e14b4

sh4shv4t commited on 12 days ago

docs: added badges to README.md

b64f2c5

sh4shv4t commited on 13 days ago

docs: added tags to READMEs

166e0c3

sh4shv4t commited on 13 days ago

Add pre-training audit scripts, OpenEnv manifest, and tune Parlay training/env (GRPO 1.5B default, min-reward filters, weighted data gen, hiring ZOPA+drift, veteran/opponent prompts, Docker/docs)

df724f2

sh4shv4t commited on 13 days ago

feat: added first draft of the hf article

cf5b410

sh4shv4t commited on 13 days ago

feat: flash-lite for data-gen and flash for UI; remove training page; card tests; --quiet data gen; data/ inspect path; random baseline; GRPO env wrapper; reward fixes (buyer ZOPA, ToM signals); drift + Brier metrics; Bayesian ToM module

15976d0

sh4shv4t commited on 13 days ago

fix: trainer notebook improvements

f3d2cd4

sh4shv4t commited on 13 days ago

fix: move global declarations before first use (grpo_train, call_gemini)

8ec5193

sh4shv4t commited on 13 days ago

feat: backup existing data + per-episode progress tracking + gemini live-call verification

48756ef

sh4shv4t commited on 13 days ago

fix: normalise reward terms for acquisition_term_sheet scale mismatch

5c7939a

sh4shv4t commited on 13 days ago

fix: fixed sys.path issues on running generate_data.py

3791108

sh4shv4t commited on 13 days ago

feat: backup pre-2.5 data + add --inspect flag for quality diagnostic run

7ad35af

sh4shv4t commited on 13 days ago

fix: upgrade gemini model string to 2.5-flash-lite + add tom diagnostic script

3f61551

sh4shv4t commited on 13 days ago

fix: gemini retry backoff + tom belief diagnostic logging

80b3b2e

sh4shv4t commited on 13 days ago

fix: resolve WebSocket HTTP 403 on OpenEnv env server

f33ad7b

sh4shv4t commited on 13 days ago

feat: streamline parlay for demo mode and add spectator negotiation mechanics

2568517

sh4shv4t commited on 13 days ago

feat: split Gemini 2.5 Flash (demo) and Flash-Lite (data), SFT threshold 0.3, favicon + check_gemini

9d82eed

sh4shv4t commited on 14 days ago

Commit History

fix: replace all 7B references with 1.5B 8111291 verified

fix: replace all 7B references with 1.5B ab8ac88 verified

sync: docs, training page fixes, OpenEnv SFT demo notebook 4904ccb verified

sync: docs, training page fixes, OpenEnv SFT demo notebook 50e78ff verified

sync: docs, training page fixes, OpenEnv SFT demo notebook 8a0b968 verified

sync: docs, training page fixes, OpenEnv SFT demo notebook 9ef99b8 verified

sync: docs, training page fixes, OpenEnv SFT demo notebook 90fedec verified

sync: docs, training page fixes, OpenEnv SFT demo notebook 4d96605 verified

sync: docs, training page fixes, OpenEnv SFT demo notebook d440298 verified

Relocate training notebooks, add BLOG and Google Colab links (SFT + GRPO HF Job), dashboard updates, and eval artifacts 00a2188

docs: formatting fixes to README.md 1b1c2d9

feat: made most image results visible 0041fa2

feat: added reward audit program 0faca0b

fix(reward func): reward func was converting data to a lit 70be177

fix: add chat template to GRPO prompts 79d9923

fix: increased max completion length to reduce model output truncation 8679498

Add GRPO HF job reward/loss curves, dashboard wiring, plot script, and fix grpo_train log_history unwrap bf9f882

fix: add pydantic/numpy/fastapi to requirements-train.txt for HF Jobs caa9c4f

feat: added images, new sft notebook, jobs to do grpo 213dee8

feat: created script to push datasets to huggingface 64b4e71

feat: training results page + SFT Colab notebook 108bc34

fix: use valid HF emoji in README YAML front matter (◈ → 🤝) f2152d2

Add OpenEnv client, compat layer, manifest, scripts, GRPO plot hook, and README 81b4b70

docs: fix Nash BATNA formula in parlay_hf_article for GitHub Markdown (HTML, no LaTeX) 23036c1

fix: reward_fn: robust parse + format gradient; colab T4 GRPO defaults. 012ae6d

fix(grpo): normalize dataset kwargs in reward_fn (TRL may pass 1-elem lists) fa5ff62

fix(grpo): load SFT as base+PEFT so adapter dirs work (no top-level model_type) d97e357

fix(grpo): put reward_weights on GRPOConfig when TRL no longer accepts it on GRPOTrainer 67bde42

fix(grpo): pass num_generations into train_grpo and pin generation_batch_size for TRL G divisibility b497689

fix: changes to grpo_train pipeline to fix divisibility of per_device_train_batch_size and gradient_accumulation_steps cadee25

fix(sft): T4/Colab-friendly defaults (grad checkpoint, batch 2/accum 8) + CLI flags f2cd270

fix(sft): pick SFTConfig max_length vs max_seq_length by TRL version at runtime 1820b7c

docs: minor corrections in the hf article f1c9f55

fix(sft): TRL 1.0+ uses max_length in SFTConfig, not max_seq_length 63e14b4

docs: added badges to README.md b64f2c5

docs: added tags to READMEs 166e0c3

Add pre-training audit scripts, OpenEnv manifest, and tune Parlay training/env (GRPO 1.5B default, min-reward filters, weighted data gen, hiring ZOPA+drift, veteran/opponent prompts, Docker/docs) df724f2

feat: added first draft of the hf article cf5b410

feat: flash-lite for data-gen and flash for UI; remove training page; card tests; --quiet data gen; data/ inspect path; random baseline; GRPO env wrapper; reward fixes (buyer ZOPA, ToM signals); drift + Brier metrics; Bayesian ToM module 15976d0

fix: trainer notebook improvements f3d2cd4

fix: move global declarations before first use (grpo_train, call_gemini) 8ec5193

feat: backup existing data + per-episode progress tracking + gemini live-call verification 48756ef

fix: normalise reward terms for acquisition_term_sheet scale mismatch 5c7939a

fix: fixed sys.path issues on running generate_data.py 3791108

feat: backup pre-2.5 data + add --inspect flag for quality diagnostic run 7ad35af

fix: upgrade gemini model string to 2.5-flash-lite + add tom diagnostic script 3f61551

fix: gemini retry backoff + tom belief diagnostic logging 80b3b2e

fix: resolve WebSocket HTTP 403 on OpenEnv env server f33ad7b

feat: streamline parlay for demo mode and add spectator negotiation mechanics 2568517

feat: split Gemini 2.5 Flash (demo) and Flash-Lite (data), SFT threshold 0.3, favicon + check_gemini 9d82eed

fix: replace all 7B references with 1.5B

8111291
verified

fix: replace all 7B references with 1.5B

ab8ac88
verified

sync: docs, training page fixes, OpenEnv SFT demo notebook

4904ccb
verified

sync: docs, training page fixes, OpenEnv SFT demo notebook

50e78ff
verified

sync: docs, training page fixes, OpenEnv SFT demo notebook

8a0b968
verified

sync: docs, training page fixes, OpenEnv SFT demo notebook

9ef99b8
verified

sync: docs, training page fixes, OpenEnv SFT demo notebook

90fedec
verified

sync: docs, training page fixes, OpenEnv SFT demo notebook

4d96605
verified

sync: docs, training page fixes, OpenEnv SFT demo notebook

d440298
verified

Relocate training notebooks, add BLOG and Google Colab links (SFT + GRPO HF Job), dashboard updates, and eval artifacts

00a2188

docs: formatting fixes to README.md

1b1c2d9

feat: made most image results visible

0041fa2

feat: added reward audit program

0faca0b

fix(reward func): reward func was converting data to a lit

70be177

fix: add chat template to GRPO prompts

79d9923

fix: increased max completion length to reduce model output truncation

8679498

Add GRPO HF job reward/loss curves, dashboard wiring, plot script, and fix grpo_train log_history unwrap

bf9f882

fix: add pydantic/numpy/fastapi to requirements-train.txt for HF Jobs

caa9c4f

feat: added images, new sft notebook, jobs to do grpo

213dee8

feat: created script to push datasets to huggingface

64b4e71

feat: training results page + SFT Colab notebook

108bc34

fix: use valid HF emoji in README YAML front matter (◈ → 🤝)

f2152d2

Add OpenEnv client, compat layer, manifest, scripts, GRPO plot hook, and README

81b4b70

docs: fix Nash BATNA formula in parlay_hf_article for GitHub Markdown (HTML, no LaTeX)

23036c1

fix: reward_fn: robust parse + format gradient; colab T4 GRPO defaults.

012ae6d

fix(grpo): normalize dataset kwargs in reward_fn (TRL may pass 1-elem lists)

fa5ff62

fix(grpo): load SFT as base+PEFT so adapter dirs work (no top-level model_type)

d97e357

fix(grpo): put reward_weights on GRPOConfig when TRL no longer accepts it on GRPOTrainer

67bde42

fix(grpo): pass num_generations into train_grpo and pin generation_batch_size for TRL G divisibility

b497689

fix: changes to grpo_train pipeline to fix divisibility of per_device_train_batch_size and gradient_accumulation_steps

cadee25

fix(sft): T4/Colab-friendly defaults (grad checkpoint, batch 2/accum 8) + CLI flags

f2cd270

fix(sft): pick SFTConfig max_length vs max_seq_length by TRL version at runtime

1820b7c

docs: minor corrections in the hf article

f1c9f55

fix(sft): TRL 1.0+ uses max_length in SFTConfig, not max_seq_length

63e14b4

docs: added badges to README.md

b64f2c5

docs: added tags to READMEs

166e0c3

Add pre-training audit scripts, OpenEnv manifest, and tune Parlay training/env (GRPO 1.5B default, min-reward filters, weighted data gen, hiring ZOPA+drift, veteran/opponent prompts, Docker/docs)

df724f2

feat: added first draft of the hf article

cf5b410

feat: flash-lite for data-gen and flash for UI; remove training page; card tests; --quiet data gen; data/ inspect path; random baseline; GRPO env wrapper; reward fixes (buyer ZOPA, ToM signals); drift + Brier metrics; Bayesian ToM module

15976d0

fix: trainer notebook improvements

f3d2cd4

fix: move global declarations before first use (grpo_train, call_gemini)

8ec5193

feat: backup existing data + per-episode progress tracking + gemini live-call verification

48756ef

fix: normalise reward terms for acquisition_term_sheet scale mismatch

5c7939a

fix: fixed sys.path issues on running generate_data.py

3791108

feat: backup pre-2.5 data + add --inspect flag for quality diagnostic run

7ad35af

fix: upgrade gemini model string to 2.5-flash-lite + add tom diagnostic script

3f61551

fix: gemini retry backoff + tom belief diagnostic logging

80b3b2e

fix: resolve WebSocket HTTP 403 on OpenEnv env server

f33ad7b

feat: streamline parlay for demo mode and add spectator negotiation mechanics

2568517

feat: split Gemini 2.5 Flash (demo) and Flash-Lite (data), SFT threshold 0.3, favicon + check_gemini

9d82eed