Banner

Gemma 4 31B Claude Opus Reasoning

Full parameter fine-tune of google/gemma-4-31B-it on 12,680 Claude Opus 4.6 reasoning traces.

First full-parameter fine-tune of Gemma 4 31B.

Highlights

  • 89.7% token accuracy after 4 epochs
  • Full parameter SFT on 8x NVIDIA H200 — all 31B parameters updated, not LoRA
  • 12,680 pure Claude Opus 4.6 traces — consistent reasoning style, no mixed-model data
  • Native Gemma 4 thinking format — uses built-in thinking tokens
  • Runs on a 4090 at Q4_K_M (~17GB VRAM)

Training

Base google/gemma-4-31B-it
Method Full parameter SFT (not LoRA)
Framework TRL SFTTrainer + PyTorch FSDP
Hardware 8x NVIDIA H200 (141GB each)
Precision bf16
Total epochs 4 (2 at lr=1e-5, then 2 more at lr=5e-6)
Sequence length 8,192
Batch size (effective) 10

Training Schedule

Two-phase approach for optimal convergence:

Phase Epochs Learning rate Result
Initial 2 1e-5 (cosine) 80.8% accuracy
Continued 2 5e-6 (cosine) 89.7% accuracy

Continuing at lower LR on a warm checkpoint improved accuracy by 9 percentage points.

Training Metrics

Metric After phase 1 After phase 2 (final)
Loss 27.5 13.6
Token accuracy 80.8% 89.7%
Grad norm 15.3 15.3
Entropy 0.69 0.34

Training Data (~12,680 samples)

All Claude Opus 4.6. No mixed-model data.

Dataset Samples Description
Crownelius/Opus-4.6-Reasoning-3300x 2,160 Cleaned Claude Opus 4.6 reasoning — math, code, diverse
TeichAI/Claude-Opus-4.6-Reasoning-887x 887 Tool-use reasoning + vague prompt handling
Roman1111111/claude-opus-4.6-10000x 9,633 Math/logic reasoning with verified solutions

Usage

from transformers import AutoProcessor, AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(
    "EganAI/gemma4-31b-opus-reasoning",
    torch_dtype="auto",
    device_map="auto",
)
processor = AutoProcessor.from_pretrained("EganAI/gemma4-31b-opus-reasoning")

messages = [
    {"role": "user", "content": "Prove that the square root of 2 is irrational."},
]

text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True, enable_thinking=True
)
inputs = processor(text=text, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs, max_new_tokens=2048, temperature=1.0, top_p=0.95, top_k=64
)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=False))

Hardware Requirements

Format VRAM Device
bf16 ~62GB 1x A100/H100 80GB
Q8 ~31GB 2x RTX 4090
Q4_K_M ~17GB RTX 4090
Q3_K_M ~14GB RTX 4080

Implementation Notes

  • Gemma 4 requires mm_token_type_ids even for text-only training — custom data collator injects zeros
  • SDPA attention only — flash attention is incompatible with Gemma's soft-capping
  • FSDP over DeepSpeed — simpler config for day-zero model support

Related Models

License

Apache 2.0 (same as Gemma 4)

Downloads last month
2,912
Safetensors
Model size
31B params
Tensor type
F32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for EganAI/gemma-4-31B-Claude-4.6-Opus-Reasoning-Distilled

Finetuned
(57)
this model
Quantizations
1 model