SFT/Alignment - Phase 007-07-MLP8: ethicalabs/Kurtis-EON1-SFT Mix (1 epoch, 200k samples, bf16, LoRA disabled)

#11

by mrs83 - opened 18 days ago

ethicalabs.ai org 18 days ago

Learning from mistakes: when we use LoRA we freeze the base model and only train a tiny fraction of the parameters. It protects pre-training base knowledge, but after several attempts I noticed bottlenecks in its ability to learn. Increasing lora alpha and r or adjusting the ratio didn't help. Consider the base model has been trained on 5BT only, we're far away from overfitting.

Training in progress on a single AMD GPU (Radeon AI PRO R9700 32GB)

mrs83

ethicalabs.ai org 18 days ago

Dataset distribution:

HuggingFaceTB/cosmopedia-v2: 64299 ( 32.15%)
teknium/OpenHermes-2.5: 59249 ( 29.62%)
mlabonne/FineTome-100k: 12990 ( 6.49%)
samhog/psychology-10k: 12129 ( 6.06%)
jondurbin/airoboros-3.2: 11883 ( 5.94%)
HuggingFaceH4/ultrafeedback_binarized: 11773 ( 5.89%)
CohereForAI/aya_dataset: 11614 ( 5.81%)
fadodr/mental_health_therapy: 10437 ( 5.22%)
garage-bAInd/Open-Platypus: 3187 ( 1.59%)
ethicalabs/IdentityShield: 2439 ( 1.22%)

mrs83

ethicalabs.ai org 17 days ago

this checkpoint captures an intermediate state prior to structural patches in our completion-only loss masking and hybrid attention routing. Due to an attention anomaly, the model effectively trapped itself in a high-confidence feedback loop.

It generates tokens with extremely high probability scores, but the outputs are purely hallucinatory reflections of the user's prompt rather than grounded logic. While a fascinating look at unconstrained predictive coding, it does not meet our needs. The bug has been resolved in subsequent weights.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment