Commit History

add: base vs tuned comparison cell for V4.2 final evaluation

0c9199c
verified

rtferraz commited on 5 days ago

feat(rewards): add sentiment mismatch penalty to prevent extraction reward hacking

b1be31c

rtferraz Claude Haiku 4.5 commited on 6 days ago

fix(probe): use TRL 0.24.0 log keys — rewards/commerce_reward_fn/mean, grad_norm (not train/ prefix)

080fd9a
verified

rtferraz commited on 9 days ago

fix(classifier): reorder _classify_task_type — insights before push to prevent reengajamento misclassification

63b1c86
verified

rtferraz commited on 9 days ago

fix(rewards): 3 bugs from Cell 8 audit — push length/formal, SQL domain, extraction int check

41eb15f
verified

rtferraz commited on 9 days ago

Fix V4.2 audit: show INPUT REVIEW alongside MODEL OUTPUT for proper human scoring

71422f3
verified

rtferraz commited on 9 days ago

Fix V4.2: task weights 40/40/10/10, full audit completions, interactive input() scoring

0fc9042
verified

rtferraz commited on 9 days ago

Fix V4.2: GDPO + IWU now active in training reward path (not just monitoring)

c95e44c
verified

rtferraz commited on 9 days ago

Add V4.2 GRPO training notebook (Gold Standard, 0.5B)

c5f1d2d
verified

rtferraz commited on 9 days ago

notebooks: add V4.1 GRPO notebook (parser fix, 600 steps, LR 5e-6, constant_with_warmup)

d7a090d
verified

rtferraz commited on 10 days ago

v4: ROOT CAUSE FIX — use standard PEFT not Unsloth get_peft_model (fused LoRA kernels have dtype bug #4891). Revert to load_in_4bit=True, dtype=None matching V3.

521e1d8
verified

rtferraz commited on 12 days ago

v4: fix NF4 fp16/bf16 dtype bug (unsloth #4891) — load_in_4bit=False, 0.5B fits in full bf16 on 24GB

ca397a5
verified

rtferraz commited on 12 days ago

v4: fix fp16/bf16 mismatch — disable Unsloth gradient checkpointing (causes dtype conflict in LoRA QKV kernels at 0.5B)

a40d2dc
verified

rtferraz commited on 12 days ago

v4 notebook: fix dtype Half/BFloat16 mismatch (explicit bf16), fix tied embeddings path, fix max_length warning

b1bb14c
verified

rtferraz commited on 12 days ago

v4 notebook: fix TypeError crash, suppress warnings, update paths to CWD, add V3 task-aware system prompts

631e559
verified

rtferraz commited on 12 days ago

Fix total_mem → total_memory in V4 notebook (PyTorch API)

5aa00ff

rtferraz Claude Sonnet 4.6 commited on 12 days ago

Add V4 Instruct-Only GRPO notebook implementing ADR-002

6c7b1ca

rtferraz Claude Sonnet 4.6 commited on 12 days ago

apply v3 task-aware thinking controls and delete deprecated notebook

1d514ac

rtferraz commited on 14 days ago

Upload grpo_vertex_v3.ipynb

c9b11b9
verified

rtferraz commited on 14 days ago

Rename notebooks/grpo_vertex_v3.ipynb to notebooks/DEPRECATED_grpo_vertex_v3.ipynb

a62f1dc
verified

rtferraz commited on 14 days ago

feat: add v3 notebook (.ipynb) — ready for Vertex AI Workbench

6c51e5f
verified

rtferraz commited on 14 days ago

Commit History

add: base vs tuned comparison cell for V4.2 final evaluation 0c9199c verified

feat(rewards): add sentiment mismatch penalty to prevent extraction reward hacking b1be31c

fix(probe): use TRL 0.24.0 log keys — rewards/commerce_reward_fn/mean, grad_norm (not train/ prefix) 080fd9a verified

fix(classifier): reorder _classify_task_type — insights before push to prevent reengajamento misclassification 63b1c86 verified

fix(rewards): 3 bugs from Cell 8 audit — push length/formal, SQL domain, extraction int check 41eb15f verified

Fix V4.2 audit: show INPUT REVIEW alongside MODEL OUTPUT for proper human scoring 71422f3 verified

Fix V4.2: task weights 40/40/10/10, full audit completions, interactive input() scoring 0fc9042 verified

Fix V4.2: GDPO + IWU now active in training reward path (not just monitoring) c95e44c verified

Add V4.2 GRPO training notebook (Gold Standard, 0.5B) c5f1d2d verified

notebooks: add V4.1 GRPO notebook (parser fix, 600 steps, LR 5e-6, constant_with_warmup) d7a090d verified

v4: ROOT CAUSE FIX — use standard PEFT not Unsloth get_peft_model (fused LoRA kernels have dtype bug #4891). Revert to load_in_4bit=True, dtype=None matching V3. 521e1d8 verified

v4: fix NF4 fp16/bf16 dtype bug (unsloth #4891) — load_in_4bit=False, 0.5B fits in full bf16 on 24GB ca397a5 verified

v4: fix fp16/bf16 mismatch — disable Unsloth gradient checkpointing (causes dtype conflict in LoRA QKV kernels at 0.5B) a40d2dc verified

v4 notebook: fix dtype Half/BFloat16 mismatch (explicit bf16), fix tied embeddings path, fix max_length warning b1bb14c verified

v4 notebook: fix TypeError crash, suppress warnings, update paths to CWD, add V3 task-aware system prompts 631e559 verified

Fix total_mem → total_memory in V4 notebook (PyTorch API) 5aa00ff

Add V4 Instruct-Only GRPO notebook implementing ADR-002 6c7b1ca

apply v3 task-aware thinking controls and delete deprecated notebook 1d514ac

Upload grpo_vertex_v3.ipynb c9b11b9 verified

Rename notebooks/grpo_vertex_v3.ipynb to notebooks/DEPRECATED_grpo_vertex_v3.ipynb a62f1dc verified

feat: add v3 notebook (.ipynb) — ready for Vertex AI Workbench 6c51e5f verified

add: base vs tuned comparison cell for V4.2 final evaluation

0c9199c
verified

feat(rewards): add sentiment mismatch penalty to prevent extraction reward hacking

b1be31c

fix(probe): use TRL 0.24.0 log keys — rewards/commerce_reward_fn/mean, grad_norm (not train/ prefix)

080fd9a
verified

fix(classifier): reorder _classify_task_type — insights before push to prevent reengajamento misclassification

63b1c86
verified

fix(rewards): 3 bugs from Cell 8 audit — push length/formal, SQL domain, extraction int check

41eb15f
verified

Fix V4.2 audit: show INPUT REVIEW alongside MODEL OUTPUT for proper human scoring

71422f3
verified

Fix V4.2: task weights 40/40/10/10, full audit completions, interactive input() scoring

0fc9042
verified

Fix V4.2: GDPO + IWU now active in training reward path (not just monitoring)

c95e44c
verified

Add V4.2 GRPO training notebook (Gold Standard, 0.5B)

c5f1d2d
verified

notebooks: add V4.1 GRPO notebook (parser fix, 600 steps, LR 5e-6, constant_with_warmup)

d7a090d
verified

v4: ROOT CAUSE FIX — use standard PEFT not Unsloth get_peft_model (fused LoRA kernels have dtype bug #4891). Revert to load_in_4bit=True, dtype=None matching V3.

521e1d8
verified

v4: fix NF4 fp16/bf16 dtype bug (unsloth #4891) — load_in_4bit=False, 0.5B fits in full bf16 on 24GB

ca397a5
verified

v4: fix fp16/bf16 mismatch — disable Unsloth gradient checkpointing (causes dtype conflict in LoRA QKV kernels at 0.5B)

a40d2dc
verified

v4 notebook: fix dtype Half/BFloat16 mismatch (explicit bf16), fix tied embeddings path, fix max_length warning

b1bb14c
verified

v4 notebook: fix TypeError crash, suppress warnings, update paths to CWD, add V3 task-aware system prompts

631e559
verified

Fix total_mem → total_memory in V4 notebook (PyTorch API)

5aa00ff

Add V4 Instruct-Only GRPO notebook implementing ADR-002

6c7b1ca

apply v3 task-aware thinking controls and delete deprecated notebook

1d514ac

Upload grpo_vertex_v3.ipynb

c9b11b9
verified

Rename notebooks/grpo_vertex_v3.ipynb to notebooks/DEPRECATED_grpo_vertex_v3.ipynb

a62f1dc
verified

feat: add v3 notebook (.ipynb) — ready for Vertex AI Workbench

6c51e5f
verified