Gemma-4-E4B-Claude-4.6-Opus-Reasoning-Distilled

Fine-tune of Gemma 4 E4B trained on Claude 4.6 Opus reasoning traces. The goal: take a compact 4B model and teach it to actually think before answering.

💡 What this is

Standard Gemma 4 E4B is already solid. This fine-tune pushes it toward a more deliberate, structured reasoning style by training on ~2.3k high-quality Chain-of-Thought samples distilled from Claude 4.6 Opus.

The model learns to plan inside <think> tags before committing to a final answer — fewer impulsive responses, more structured breakdowns.

<think>
1. What is actually being asked here?
2. What are the constraints and edge cases?
3. Step-by-step plan...
4. Verify the logic holds.
</think>

Final answer here.

🗺️ Pipeline

google/gemma-4-E4B-it
 │
 ▼
SFT + QLoRA 4-bit (Unsloth)
 │  loss masked to responses only
 ▼
Gemma-4-E4B-Claude-4.6-Opus-Reasoning-Distilled
 │
 ▼
exported as GGUF (Q4_K_M + Q8_0)

⚙️ Training Details

Parameter Value
Base model google/gemma-4-E4B-it
Framework Unsloth
Method SFT + QLoRA (4-bit)
Dataset nohurry/Opus-4.6-Reasoning-3000x-filtered
Hardware RTX 5060 Ti 16GB
LoRA rank / alpha 16 / 16
Epochs 3
Max seq length 2048
Optimizer adamw_8bit
Learning rate 2e-4
LR scheduler cosine
Loss masking train_on_responses_only

📚 Dataset

Dataset Description
nohurry/Opus-4.6-Reasoning-3000x-filtered ~2.3k filtered Claude 4.6 Opus reasoning trajectories covering math, logic, and coding

🚀 Run it

Ollama:

ollama run hf.co/arsovskidev/Gemma-4-E4B-Claude-4.6-Opus-Reasoning-Distilled

llama.cpp:

./llama-cli -hf arsovskidev/Gemma-4-E4B-Claude-4.6-Opus-Reasoning-Distilled \
  --temp 1.0 --top-p 0.95 --top-k 64

✅ Good at

  • Multi-step math and logic problems
  • Code problem decomposition and debugging
  • Tasks where showing reasoning is more valuable than raw speed
  • Structured analysis of complex prompts

⚠️ Limitations

  • Text only — multimodal capabilities of the base model are not trained here
  • Small dataset — treat this as a focused reasoning fine-tune, not a general-purpose upgrade
  • Still an LLM — hallucinations happen, especially on factual recall outside the training domain

📜 License

Apache 2.0 + Gemma Terms of Use.

"Claude" is a trademark of Anthropic. This project is not affiliated with or endorsed by Anthropic — the name refers to the reasoning distillation data source only.

🙏 Acknowledgements

Unsloth for making this feasible on consumer hardware, and nohurry for the dataset.

📖 Citation

@misc{arsovskidev_gemma4_opus_distilled,
  title        = {Gemma-4-E4B-Claude-4.6-Opus-Reasoning-Distilled},
  author       = {arsovskidev},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/arsovskidev/Gemma-4-E4B-Claude-4.6-Opus-Reasoning-Distilled}}
}
Downloads last month
10,180
GGUF
Model size
8B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support

Model tree for arsovskidev/Gemma-4-E4B-Claude-4.6-Opus-Reasoning-Distilled

Quantized
(74)
this model
Finetunes
2 models
Quantizations
1 model

Dataset used to train arsovskidev/Gemma-4-E4B-Claude-4.6-Opus-Reasoning-Distilled