YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

πŸƒ SSD Training Complete: Gemma 4 E2B IT

Status: βœ… Training Completed

A QLoRA adapter has been trained on google/gemma-4-E2B-it using the SSD (Simple Self-Distillation) technique from Apple's arXiv:2604.01193.

Trained Model

LoRA Adapter: ludsvick/gemma-4-E2B-it-SSD

Training Details:

  • Base model: google/gemma-4-E2B-it (5.1B params, multimodal Gemma 4)
  • Dataset: wrmedford/Gemma-4-E4B-it-SSD (~17K coding problems from LiveCodeBench v6)
  • Method: QLoRA (4-bit NF4 quantization)
  • LoRA rank: 16, alpha: 32, dropout: 0.05
  • Target: All language_model layers (q/k/v/o + gate/up/down projections, all 35 layers)
  • NOT targeting vision/audio towers (preserves multimodal capability)

How to Use

Option 1: Use LoRA adapter directly (memory-efficient)

from peft import PeftModel, PeftConfig
from transformers import AutoModelForImageTextToText, AutoTokenizer
import torch

base = AutoModelForImageTextToText.from_pretrained(
    'google/gemma-4-E2B-it',
    torch_dtype=torch.bfloat16,
    device_map='auto',
)
tokenizer = AutoTokenizer.from_pretrained('google/gemma-4-E2B-it')
model = PeftModel.from_pretrained(base, 'ludsvick/gemma-4-E2B-it-SSD')
model.eval()

# Generate at T=0.6 (per SSD paper)
messages = [{'role': 'user', 'content': 'Write a Fibonacci function...'}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors='pt').to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.6,
        do_sample=True,
        top_p=0.95,
    )
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Option 2: Merge into full model (slower, needs more VRAM)

from merge_and_test import main
main()  # Merge adapter and run test inference

Files in this Repo

File Description
train_ssd.py QLoRA training script (what was used)
train_ssd_full.py Full SSD script with on-policy generation
train_ssd_sft.py Alternative SFT-only script
merge_and_test.py Merge adapter + run inference test
evaluate_lcb.py LiveCodeBench evaluation script
adapter_model.safetensors Trained LoRA weights (92.2MB)
adapter_config.json LoRA configuration

Next Steps

  1. Test the model: Run merge_and_test.py to verify it generates code correctly
  2. Evaluate: Run evaluate_lcb.py on LiveCodeBench v6 to measure improvement
  3. Compare: Evaluate base model vs SSD model at T=0.6 β€” expect 10+ points on hard problems
  4. Deploy: Push merged model or use LoRA adapter with vLLM for inference

SSD Method (Reminder)

From Apple's paper:

  1. Generate at high temperature (T β‰₯ 1.0) β€” diverse token exploration
  2. SFT on all outputs (regardless of quality) β€” reshape distribution
  3. Evaluate at low temperature (T ≀ 0.6) β€” improved precision + retained diversity

The key insight: training on diverse (even imperfect) samples with high-temp exploration reshapes the model's token distribution to include better solutions while keeping conditional entropy high for continued exploration.

Citation

@article{ssd2025,
  title={Embarrassingly Simple Self-Distillation Improves Code Generation},
  author={{Apple ML Team}},
  year={2025},
  eprint={2604.01193},
  archivePrefix={arXiv},
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for ludsvick/gemma-4-E2B-it-SSD