add: V4.2 Final Report β complete project retrospective with evidence-based analysis 22cca8b verified rtferraz commited on 4 days ago
add: notebook cell insertion script for base vs tuned comparison c641edb verified rtferraz commited on 5 days ago
add: base vs tuned comparison cell for V4.2 final evaluation 0c9199c verified rtferraz commited on 5 days ago
feat(rewards): add sentiment mismatch penalty to prevent extraction reward hacking b1be31c rtferraz Claude Haiku 4.5 commited on 6 days ago
fix(probe): use TRL 0.24.0 log keys β rewards/commerce_reward_fn/mean, grad_norm (not train/ prefix) 080fd9a verified rtferraz commited on 9 days ago
fix(classifier): reorder _classify_task_type β insights before push to prevent reengajamento misclassification 63b1c86 verified rtferraz commited on 9 days ago
fix(rewards): 3 bugs from Cell 8 audit β push length/formal, SQL domain, extraction int check 41eb15f verified rtferraz commited on 9 days ago
Fix V4.2 audit: show INPUT REVIEW alongside MODEL OUTPUT for proper human scoring 71422f3 verified rtferraz commited on 9 days ago
Fix V4.2: task weights 40/40/10/10, full audit completions, interactive input() scoring 0fc9042 verified rtferraz commited on 9 days ago
Fix V4.2: GDPO + IWU now active in training reward path (not just monitoring) c95e44c verified rtferraz commited on 9 days ago
Add V4.2 GRPO training notebook (Gold Standard, 0.5B) c5f1d2d verified rtferraz commited on 9 days ago
docs: add V4.1 run report β detailed evaluation with per-task analysis and V4.2 roadmap 482efc4 verified rtferraz commited on 9 days ago
notebooks: add V4.1 GRPO notebook (parser fix, 600 steps, LR 5e-6, constant_with_warmup) d7a090d verified rtferraz commited on 10 days ago
docs: add V4 run assessment with lessons learned and improvement roadmap cfaf49c verified rtferraz commited on 10 days ago
v4: ROOT CAUSE FIX β use standard PEFT not Unsloth get_peft_model (fused LoRA kernels have dtype bug #4891). Revert to load_in_4bit=True, dtype=None matching V3. 521e1d8 verified rtferraz commited on 12 days ago
v4: fix NF4 fp16/bf16 dtype bug (unsloth #4891) β load_in_4bit=False, 0.5B fits in full bf16 on 24GB ca397a5 verified rtferraz commited on 12 days ago
v4: fix fp16/bf16 mismatch β disable Unsloth gradient checkpointing (causes dtype conflict in LoRA QKV kernels at 0.5B) a40d2dc verified rtferraz commited on 12 days ago
v4 notebook: fix dtype Half/BFloat16 mismatch (explicit bf16), fix tied embeddings path, fix max_length warning b1bb14c verified rtferraz commited on 12 days ago
v4 notebook: fix TypeError crash, suppress warnings, update paths to CWD, add V3 task-aware system prompts 631e559 verified rtferraz commited on 12 days ago
Fix total_mem β total_memory in V4 notebook (PyTorch API) 5aa00ff rtferraz Claude Sonnet 4.6 commited on 12 days ago
Add V4 Instruct-Only GRPO notebook implementing ADR-002 6c7b1ca rtferraz Claude Sonnet 4.6 commited on 12 days ago
ADR-002: V4 Instruct-Only GRPO β revises dual-model plan based on model repo audit 50e0e4d verified rtferraz commited on 12 days ago
Add comprehensive investigation report β performance audit, unexplored alternatives, literature-backed recommendations 4312bfd verified rtferraz commited on 13 days ago
Add session checkpoint: v3 launch decision with full context bead5cb verified rtferraz commited on 14 days ago
apply v3 task-aware thinking controls and delete deprecated notebook 1d514ac rtferraz commited on 14 days ago
Add v3 thinking control patch - task-aware system prompts + think efficiency reward 0f39df7 verified rtferraz commited on 14 days ago
Initial commit: Tucano2-Commerce GRPO v3 training pipeline fa4a874 rtferraz Claude Opus 4.6 commited on 14 days ago
Rename notebooks/grpo_vertex_v3.ipynb to notebooks/DEPRECATED_grpo_vertex_v3.ipynb a62f1dc verified rtferraz commited on 14 days ago
feat: add v3 notebook (.ipynb) β ready for Vertex AI Workbench 6c51e5f verified rtferraz commited on 14 days ago
feat: add GRPO v3 implementation with entropy collapse fixes a6a8b11 verified rtferraz commited on 14 days ago
docs: add ADR-001 next steps with detailed execution plans b47b36b verified rtferraz commited on 14 days ago