Gemma-4-Opus-Reasoning

A locally fine-tuned Gemma 4 E4B model optimized for structured reasoning, multi-step problem solving, and efficient inference on Apple Silicon and GGUF-compatible runtimes.

Overview

This model was fine-tuned using LoRA on a curated dataset of approximately 11k reasoning-focused examples, merged from two high-quality sources. The result is a compact model that preserves Gemma's conversational fluency while improving logical coherence and response structure.

Property Value
Base model mlx-community/gemma-4-e4b-it-4bit
Fine-tuning LoRA (rank=16, layers=8)
Training steps 15k iterations
Training samples ~11k conversations
Export format GGUF (q8_0)
Target runtime Apple Silicon M-series, llama.cpp, Ollama, mlx-lm

Dataset

Training data was built by merging and normalizing two sources:

  1. nohurry/Opus-4.6-Reasoning-3000x-filtered – structured logical reasoning examples
  2. Roman1111111/claude-opus-4.6-10000x – multi-turn problem solving dialogues

The merged dataset is available at: https://huggingface.co/datasets/emanubiz/opus-claude-merged

Capabilities

  • Multi-step reasoning and structured problem solving
  • Math and logic tasks
  • Clear and well-formatted responses
  • Efficient inference on consumer hardware (16 GB RAM)
  • Tool calling via custom chat template (experimental — see Notes)

Repository Contents

  • gemma4-opus.gguf — quantized model (q8_0)
  • chat_template.jinja — inference template with tool calling support
  • README.md — documentation

Chat Format

This model uses a custom turn-based format.

Jinja template

{{ bos_token }}
{%- if tools %}
<|turn>system
You are a helpful assistant with access to tools. When you need to use a tool, respond with a tool call in this exact JSON format:
<tool_call>{"name": "FUNCTION_NAME", "arguments": {"param": "value"}}</tool_call>

Available tools:
{%- for t in tools %}
{{ t.function.name }}: {{ t.function.description }}
Parameters: {{ t.function.parameters | tojson }}
{%- endfor %}
<turn|>
{%- endif %}
{%- for m in messages %}
{%- if m.role == 'user' %}
<|turn>user
{{ m.content }}
<turn|>
{%- elif m.role == 'assistant' or m.role == 'model' %}
<|turn>model
{%- if m.tool_calls %}
{%- for tc in m.tool_calls %}
<tool_call>{"name": "{{ tc.function.name }}", "arguments": {{ tc.function.arguments | tojson }}}</tool_call>
{%- endfor %}
{%- endif %}
{%- if m.content %} {{ m.content }}{%- endif %}
<turn|>
{%- elif m.role == 'tool' %}
<|turn>tool
<tool_result>{{ m.content }}</tool_result>
<turn|>
{%- endif %}
{%- endfor %}
{%- if add_generation_prompt %}
<|turn>model
{% endif %}

Example

<|turn>user
What is the square root of 144 plus 13?
<turn|>
<|turn>model
The square root of 144 is 12. Adding 13 gives 25.
<turn|>

Usage

llama.cpp / llama-server

llama-server \
  -m gemma4-opus.gguf \
  -ngl 99 \
  -c 65536 \
  --chat-template-file chat_template.jinja \
  --port 8080 \
  --host 0.0.0.0

Ollama

Modelfile:

FROM ./gemma4-opus.gguf
TEMPLATE "<|turn>user {{ .Prompt }}<turn|> <|turn>model {{ .Response }}<turn|>"
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER top_k 64
PARAMETER stop "<turn|>"

Note: Ollama's Modelfile format does not support the full Jinja template. For tool calling, use llama-server with --chat-template-file.

Notes

  • Tool calling is experimental. The model was not fine-tuned on tool-use data. Tool calling works through prompt injection via the chat template, relying on the base Gemma 4 model's pre-training. Simple single-tool calls work; complex multi-tool chains may be inconsistent.
  • The base model was quantized from a 4-bit MLX checkpoint, not from full-precision weights — some quality loss was already present before GGUF conversion.
  • Internal reasoning tokens (think blocks) are removed via training data preprocessing, not post-processing.
  • Not evaluated on standard benchmarks.
  • GGUF q8_0 balances quality vs memory.

License

Same license as base Gemma model: https://ai.google.dev/gemma/terms

Author

Created by emanubiz

Acknowledgements

  • Google DeepMind (Gemma)
  • mlx-lm and llama.cpp communities
  • Opus reasoning dataset contributors
  • Anthropic Claude
Downloads last month
1,348
GGUF
Model size
8B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for emanubiz/gemma4-E4B-opus-finetuned

Adapter
(1)
this model

Datasets used to train emanubiz/gemma4-E4B-opus-finetuned