rtferraz
/

tucano2-commerce

Model card Files Files and versions

tucano2-commerce / notebooks /v4_2_instruct_grpo.ipynb

Commit History

feat(rewards): add sentiment mismatch penalty to prevent extraction reward hacking

b1be31c

rtferraz Claude Haiku 4.5 commited on 7 days ago

fix(probe): use TRL 0.24.0 log keys — rewards/commerce_reward_fn/mean, grad_norm (not train/ prefix)

080fd9a
verified

rtferraz commited on 10 days ago

fix(classifier): reorder _classify_task_type — insights before push to prevent reengajamento misclassification

63b1c86
verified

rtferraz commited on 10 days ago

fix(rewards): 3 bugs from Cell 8 audit — push length/formal, SQL domain, extraction int check

41eb15f
verified

rtferraz commited on 10 days ago

Fix V4.2 audit: show INPUT REVIEW alongside MODEL OUTPUT for proper human scoring

71422f3
verified

rtferraz commited on 10 days ago

Fix V4.2: task weights 40/40/10/10, full audit completions, interactive input() scoring

0fc9042
verified

rtferraz commited on 10 days ago

Fix V4.2: GDPO + IWU now active in training reward path (not just monitoring)

c95e44c
verified

rtferraz commited on 10 days ago

Add V4.2 GRPO training notebook (Gold Standard, 0.5B)

c5f1d2d
verified

rtferraz commited on 10 days ago