rtferraz
/

tucano2-commerce

Model card Files Files and versions

tucano2-commerce / notebooks /v4_instruct_grpo.ipynb

Commit History

v4: ROOT CAUSE FIX — use standard PEFT not Unsloth get_peft_model (fused LoRA kernels have dtype bug #4891). Revert to load_in_4bit=True, dtype=None matching V3.

521e1d8
verified

rtferraz commited on 12 days ago

v4: fix NF4 fp16/bf16 dtype bug (unsloth #4891) — load_in_4bit=False, 0.5B fits in full bf16 on 24GB

ca397a5
verified

rtferraz commited on 12 days ago

v4: fix fp16/bf16 mismatch — disable Unsloth gradient checkpointing (causes dtype conflict in LoRA QKV kernels at 0.5B)

a40d2dc
verified

rtferraz commited on 12 days ago

v4 notebook: fix dtype Half/BFloat16 mismatch (explicit bf16), fix tied embeddings path, fix max_length warning

b1bb14c
verified

rtferraz commited on 13 days ago

v4 notebook: fix TypeError crash, suppress warnings, update paths to CWD, add V3 task-aware system prompts

631e559
verified

rtferraz commited on 13 days ago

Fix total_mem → total_memory in V4 notebook (PyTorch API)

5aa00ff

rtferraz Claude Sonnet 4.6 commited on 13 days ago

Add V4 Instruct-Only GRPO notebook implementing ADR-002

6c7b1ca

rtferraz Claude Sonnet 4.6 commited on 13 days ago