rtferraz
/

tucano2-commerce

Model card Files Files and versions

tucano2-commerce / notebooks

271 kB

Ctrl+K

Ctrl+K

2 contributors

History: 20 commits

rtferraz's picture

feat(rewards): add sentiment mismatch penalty to prevent extraction reward hacking

b1be31c 25 days ago

grpo_vertex_v3.ipynb

82.6 kB
apply v3 task-aware thinking controls and delete deprecated notebook about 1 month ago
v4_1_instruct_grpo.ipynb

50.3 kB
notebooks: add V4.1 GRPO notebook (parser fix, 600 steps, LR 5e-6, constant_with_warmup) 29 days ago
v4_2_instruct_grpo.ipynb

90.8 kB
feat(rewards): add sentiment mismatch penalty to prevent extraction reward hacking 25 days ago
v4_instruct_grpo.ipynb

47.7 kB
v4: ROOT CAUSE FIX — use standard PEFT not Unsloth get_peft_model (fused LoRA kernels have dtype bug #4891). Revert to load_in_4bit=True, dtype=None matching V3. about 1 month ago