gemma-4-26B-A4B-it-Claude-Opus-Heretic-ara-GGUF

🌟 Gemma 4 - 26B A4B x Claude Opus 4.6

🌟 This is merge between Gemma 4 - 26B A4B x Claude Opus 4.6 model from TeichAI and mradermacher model.

Build Environment & Features:

Fine-tuning Framework: Unsloth

Reasoning Effort: High

This model bridges the gap between Google's exceptional open-weights architecture and Claude 4.6's profound reasoning capabilities, leveraging cutting-edge fine-tuning environments.

💡 Model Introduction

Gemma 4 - 26B A4B x Claude Opus 4.6 is a highly capable model fine-tuned on top of the powerful Gemma 4 architecture. The model's core directive is to absorb state-of-the-art reasoning distillation, primarily sourced from Claude-4.6 Opus interactions.

By utilizing datasets where the reasoning effort was explicitly set to High, this model excels in breaking down complex problems and delivering precise, nuanced solutions across a variety of demanding domains.

🗺️ Training Pipeline Overview

Base Model (unsloth/gemma-4-26B-A4B-it)
 │
 ▼
Supervised Fine-Tuning (SFT) + High-Effort Reasoning Datasets
 │
 ▼
Final Model (Gemma 4 - 26B A4B x Claude Opus 4.6)

📋 Stage Details & Benchmarks

Performance vs Size:

Deep Dive Analysis: For more comprehensive insights regarding the base capabilities of the Gemma 4 architecture, please refer to this Analysis Document.

🔹 Supervised Fine-Tuning (Meeting Claude)

Objective: To inject high-density reasoning logic and establish a strict format for complex problem-solving.
Methodology: We utilized Unsloth for highly efficient memory and compute optimization during the fine-tuning process. The model was trained extensively on various reasoning trajectories from Claude Opus 4.6 to adopt a structured and efficient thinking pattern.

📚 All Datasets Used

The dataset consists of high-quality, high-effort reasoning distillation data:

Dataset Name	Description / Purpose
`TeichAI/Claude-Opus-4.6-Reasoning-887x`	Core Claude 4.6 Opus reasoning trajectories.
`TeichAI/Claude-Sonnet-4.6-Reasoning-1100x`	Additional high-density reasoning instances from Claude 4.6 Sonnet.
`TeichAI/claude-4.5-opus-high-reasoning-250x`	Legacy high-intensity reasoning distillation.
`TeichAI/Claude-Opus-4.6-Reasoning-500x`	Additional Opus 4.6 reasoning traces targeting domain diversity
`Crownelius/Opus-4.6-Reasoning-2100x-formatted`	Crownelius's extensively formatted Opus reasoning dataset for structural reinforcement.

🌟 Core Skills & Capabilities

Thanks to its robust base model and high-effort reasoning distillation, this model is highly optimized for the following use cases:

💻 Coding: Advanced code generation, debugging, and software architecture planning.
🔬 Science: Deep scientific reasoning, hypothesis evaluation, and analytical problem-solving.
🔎 Deep Research: Navigating complex, multi-step research queries and synthesizing vast amounts of information.
🧠 General Purpose: Highly capable instruction-following for everyday tasks requiring high logical coherence.

Best Practices

For the best performance, use these configurations and best practices:

1. Sampling Parameters

Use the following standardized sampling configuration across all use cases:

temperature=1.0
top_p=0.95
top_k=64

2. Thinking Mode Configuration

Compared to Gemma 3, the models use standard system, assistant, and user roles. To properly manage the thinking process, use the following control tokens:

Trigger Thinking: Thinking is enabled by including the <|think|> token at the start of the system prompt. To disable thinking, remove the token.
Standard Generation: When thinking is enabled, the model will output its internal reasoning followed by the final answer using this structure:
<|channel>thought\n[Internal reasoning]<channel|>
Disabled Thinking Behavior: For all models except for the E2B and E4B variants, if thinking is disabled, the model will still generate the tags but with an empty thought block:
<|channel>thought\n<channel|>[Final answer]

Note that many libraries like Transformers and llama.cpp handle the complexities of the chat template for you.

3. Multi-Turn Conversations

No Thinking Content in History: In multi-turn conversations, the historical model output should only include the final response. Thoughts from previous model turns must not be added before the next user turn begins.

4. Modality order

For optimal performance with multimodal inputs, place image and/or audio content before the text in your prompt.

5. Variable Image Resolution

Aside from variable aspect ratios, Gemma 4 supports variable image resolution through a configurable visual token budget, which controls how many tokens are used to represent an image. A higher token budget preserves more visual detail at the cost of additional compute, while a lower budget enables faster inference for tasks that don't require fine-grained understanding.

The supported token budgets are: 70, 140, 280, 560, and 1120.
- Use lower budgets for classification, captioning, or video understanding, where faster inference and processing many frames outweigh fine-grained detail.
- Use higher budgets for tasks like OCR, document parsing, or reading small text.

6. Audio

Use the following prompt structures for audio processing:

Audio Speech Recognition (ASR)

Transcribe the following speech segment in {LANGUAGE} into {LANGUAGE} text.

Follow these specific instructions for formatting the answer:
* Only output the transcription, with no newlines.
* When transcribing numbers, write the digits, i.e. write 1.7 and not one point seven, and write 3 instead of three.

Automatic Speech Translation (AST)

Transcribe the following speech segment in {SOURCE_LANGUAGE}, then translate it into {TARGET_LANGUAGE}.
When formatting the answer, first output the transcription in {SOURCE_LANGUAGE}, then one newline, then output the string '{TARGET_LANGUAGE}: ', then the translation in {TARGET_LANGUAGE}.

7. Audio and Video Length

All models support image inputs and can process videos as frames whereas the E2B and E4B models also support audio inputs. Audio supports a maximum length of 30 seconds. Video supports a maximum of 60 seconds assuming the images are processed at one frame per second.

🙏 Acknowledgements

Google: For providing an exceptional open weights model. Read more about Gemma 4 on the Google Innovation Blog.
Unsloth: For assembling ready-to-use, cutting-edge fine-tuning environments that make this work possible.
Crownelius: For creating and sharing his awesome Opus reasoning dataset with the community.

📖 Citation

If you use this model in your research or projects, please cite:

@misc{teichai_gemma4_26b_a4b_opus_distilled,
  title        = {Gemma-4-26B-A4B-it-Claude-Opus-Distill},
  author       = {TeichAI},
  year         = {2026},
  publisher    = {Hugging Face},
  howpublished = {\url{https://huggingface.co/TeichAI/gemma-4-26B-A4B-it-Claude-Opus-Distill}}
}

Downloads last month: 2,696

GGUF

Model size

25B params

Architecture

gemma4

Hardware compatibility

4-bit

8-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for LuffyTheFox/gemma-4-26B-A4B-it-Claude-Opus-Heretic-ara-GGUF

Base model

google/gemma-4-26B-A4B-it

Finetuned

unsloth/gemma-4-26B-A4B-it