Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
rtferraz
/
tucano2-commerce
like
0
Model card
Files
Files and versions
xet
Community
Copy to bucket
new
b1be31c
tucano2-commerce
/
notebooks
271 kB
Ctrl+K
Ctrl+K
2 contributors
History:
20 commits
rtferraz
feat(rewards): add sentiment mismatch penalty to prevent extraction reward hacking
b1be31c
25 days ago
grpo_vertex_v3.ipynb
82.6 kB
apply v3 task-aware thinking controls and delete deprecated notebook
about 1 month ago
v4_1_instruct_grpo.ipynb
50.3 kB
notebooks: add V4.1 GRPO notebook (parser fix, 600 steps, LR 5e-6, constant_with_warmup)
29 days ago
v4_2_instruct_grpo.ipynb
Safe
90.8 kB
feat(rewards): add sentiment mismatch penalty to prevent extraction reward hacking
25 days ago
v4_instruct_grpo.ipynb
47.7 kB
v4: ROOT CAUSE FIX — use standard PEFT not Unsloth get_peft_model (fused LoRA kernels have dtype bug #4891). Revert to load_in_4bit=True, dtype=None matching V3.
about 1 month ago