feat(rewards): add sentiment mismatch penalty to prevent extraction reward hacking b1be31c rtferraz Claude Haiku 4.5 commited on 7 days ago
fix(probe): use TRL 0.24.0 log keys β rewards/commerce_reward_fn/mean, grad_norm (not train/ prefix) 080fd9a verified rtferraz commited on 10 days ago
fix(classifier): reorder _classify_task_type β insights before push to prevent reengajamento misclassification 63b1c86 verified rtferraz commited on 10 days ago
fix(rewards): 3 bugs from Cell 8 audit β push length/formal, SQL domain, extraction int check 41eb15f verified rtferraz commited on 10 days ago
Fix V4.2 audit: show INPUT REVIEW alongside MODEL OUTPUT for proper human scoring 71422f3 verified rtferraz commited on 10 days ago
Fix V4.2: task weights 40/40/10/10, full audit completions, interactive input() scoring 0fc9042 verified rtferraz commited on 10 days ago
Fix V4.2: GDPO + IWU now active in training reward path (not just monitoring) c95e44c verified rtferraz commited on 10 days ago
Add V4.2 GRPO training notebook (Gold Standard, 0.5B) c5f1d2d verified rtferraz commited on 10 days ago