CRMA Fine-Tuner: QLoRA + gradient stability adapter (CRMA + ZClip + PiSSA) for TinyLlama, Gemma, Mistral — ablation results

#1153
by Fourwheels2512 - opened

Hey leaderboard community! I built a fine-tuning Space that adds gradient stability on top of QLoRA, and wanted to share the ablation results here since many of you evaluate models post-training.

Space: https://huggingface.co/spaces/open-llm-leaderboard/crma-fine-tuner

What is CRMA Fine-Tuner?

CRMA Fine-Tuner is a Hugging Face Space for fine-tuning TinyLlama, Gemma, and Mistral with three stability components layered on top of standard QLoRA:

  • CRMA (Centered Residual Moving Average) — a low-rank adapter that smooths gradient updates using an exponential moving average of residuals
  • ZClip — adaptive gradient clipping based on z-score statistics, replacing fixed max_grad_norm
  • PiSSA — principal singular value initialization for LoRA weights, for faster convergence

The idea came from observing repeated loss spikes during Mistral fine-tuning that standard gradient clipping couldn’t handle cleanly.

Ablation Results (TinyLlama-1.1B on Alpaca-style data)

Config Final Loss Spike Count Notes
Baseline QLoRA 1.42 7 Standard setup
+ ZClip 1.38 3 Adaptive clipping helps
+ PiSSA init 1.35 3 Better initialization
+ CRMA (full) 1.31 1 Full stability stack

Why This Might Matter for Leaderboard Submissions

Models fine-tuned with unstable training often have inconsistent benchmark performance — the spike artifacts can show up as variance in eval. CRMA aims to produce cleaner, more reproducible fine-tunes.

Feedback welcome, especially from anyone who has noticed training instability affecting their leaderboard submissions.

Space: https://huggingface.co/spaces/open-llm-leaderboard/crma-fine-tuner

Sign up or log in to comment