Instructions to use WhoTookMyAmogusNickname/FINAL-Bench_Darwin-28B-Opus-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use WhoTookMyAmogusNickname/FINAL-Bench_Darwin-28B-Opus-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="WhoTookMyAmogusNickname/FINAL-Bench_Darwin-28B-Opus-GGUF",
	filename="Darwin-28B-Opus-Q2_K.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use WhoTookMyAmogusNickname/FINAL-Bench_Darwin-28B-Opus-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf WhoTookMyAmogusNickname/FINAL-Bench_Darwin-28B-Opus-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf WhoTookMyAmogusNickname/FINAL-Bench_Darwin-28B-Opus-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf WhoTookMyAmogusNickname/FINAL-Bench_Darwin-28B-Opus-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf WhoTookMyAmogusNickname/FINAL-Bench_Darwin-28B-Opus-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf WhoTookMyAmogusNickname/FINAL-Bench_Darwin-28B-Opus-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf WhoTookMyAmogusNickname/FINAL-Bench_Darwin-28B-Opus-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf WhoTookMyAmogusNickname/FINAL-Bench_Darwin-28B-Opus-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf WhoTookMyAmogusNickname/FINAL-Bench_Darwin-28B-Opus-GGUF:Q4_K_M

Use Docker

docker model run hf.co/WhoTookMyAmogusNickname/FINAL-Bench_Darwin-28B-Opus-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use WhoTookMyAmogusNickname/FINAL-Bench_Darwin-28B-Opus-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "WhoTookMyAmogusNickname/FINAL-Bench_Darwin-28B-Opus-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "WhoTookMyAmogusNickname/FINAL-Bench_Darwin-28B-Opus-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/WhoTookMyAmogusNickname/FINAL-Bench_Darwin-28B-Opus-GGUF:Q4_K_M

Ollama
How to use WhoTookMyAmogusNickname/FINAL-Bench_Darwin-28B-Opus-GGUF with Ollama:
```
ollama run hf.co/WhoTookMyAmogusNickname/FINAL-Bench_Darwin-28B-Opus-GGUF:Q4_K_M
```

Unsloth Studio new

How to use WhoTookMyAmogusNickname/FINAL-Bench_Darwin-28B-Opus-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for WhoTookMyAmogusNickname/FINAL-Bench_Darwin-28B-Opus-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for WhoTookMyAmogusNickname/FINAL-Bench_Darwin-28B-Opus-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for WhoTookMyAmogusNickname/FINAL-Bench_Darwin-28B-Opus-GGUF to start chatting

Pi new

How to use WhoTookMyAmogusNickname/FINAL-Bench_Darwin-28B-Opus-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf WhoTookMyAmogusNickname/FINAL-Bench_Darwin-28B-Opus-GGUF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "WhoTookMyAmogusNickname/FINAL-Bench_Darwin-28B-Opus-GGUF:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use WhoTookMyAmogusNickname/FINAL-Bench_Darwin-28B-Opus-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf WhoTookMyAmogusNickname/FINAL-Bench_Darwin-28B-Opus-GGUF:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default WhoTookMyAmogusNickname/FINAL-Bench_Darwin-28B-Opus-GGUF:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use WhoTookMyAmogusNickname/FINAL-Bench_Darwin-28B-Opus-GGUF with Docker Model Runner:
```
docker model run hf.co/WhoTookMyAmogusNickname/FINAL-Bench_Darwin-28B-Opus-GGUF:Q4_K_M
```

Lemonade

How to use WhoTookMyAmogusNickname/FINAL-Bench_Darwin-28B-Opus-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull WhoTookMyAmogusNickname/FINAL-Bench_Darwin-28B-Opus-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.FINAL-Bench_Darwin-28B-Opus-GGUF-Q4_K_M

List all available models

lemonade list

GGUF quantizations of FINAL-Bench/Darwin-28B-Opus

llama.cpp commit used for conversion: 27aef3d
llama.cpp build used for quantization: b8983

Current quants are static(non-imatrix), if you want imatrix quants, please open a discussion.

Original model card below:

Darwin-28B-Opus — Qwen3.6-27B × Opus-Distilled Evolutionary Merge

Qwen3.6-27B dense · 27.6B parameters · Hybrid Linear/Full Attention · BF16 · Thinking Mode · Apache 2.0 Darwin V7 evolutionary merge: Father × Opus-distilled Mother → 88.89% on GPQA Diamond (3-stage adaptive evaluation)

Abstract

Darwin-28B-Opus is the first reasoning model of the Darwin series built on the Qwen3.6 generation backbone. Produced by the Darwin V7 evolutionary breeding engine from two publicly available parents, it combines the strong bilingual reasoning of Qwen3.6-27B with Claude Opus 4-style chain-of-thought distilled behaviour.

On the GPQA Diamond graduate-level reasoning benchmark (198 PhD-level questions), Darwin-28B-Opus scores 88.89 % under the standard 3-stage adaptive evaluation, slightly edging out its larger MoE sibling Darwin-36B-Opus (88.4 %) and clearly surpassing its Qwen3.5-generation counterpart Darwin-27B-Opus (86.9 %).

🧬 Model Lineage

Role	Model	Role in the Merge
Father (父)	`Qwen/Qwen3.6-27B`	Qwen3.6 generation dense backbone with hybrid linear/full attention.
Mother (母)	`rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled`	Claude Opus reasoning-distilled variant of the same backbone (Jackrong-style distillation, 14 k traces).
Offspring	`Darwin-28B-Opus` (this model)	Darwin V7 evolutionary merge; Qwen3.6 architecture retained, Opus reasoning style inherited.

Why 28B? The 28B label denotes the Qwen3.6-generation member of the Darwin lineup (+1 over the Qwen3.5-era Darwin-27B-Opus). The actual parameter count is 27.6 B, and the architecture exactly follows Qwen3.6-27B.

⚙️ Technical Specifications

Component	Value
Architecture	`Qwen3_5ForConditionalGeneration` (Qwen3.6 generation, hybrid linear + full attention)
Parameters	27.6 B (BF16)
Hidden size	5 120
Intermediate size	17 408
Head dim	256
Layers	64 (3 linear : 1 full attention, `full_attention_interval = 4`)
Precision	bfloat16
Context length	Inherited from base (long-chain reasoning supported)
License	Apache 2.0

🏆 Benchmark — GPQA Diamond (198 questions)

Darwin-28B-Opus is evaluated under our standard 3-stage adaptive evaluation protocol, identical to the protocol used across the Darwin series.

Stage	Decoding Protocol	Cost	Accuracy
Stage 1	Single-shot greedy baseline	1×	74.75 % (148 / 198)
Stage 2	Majority vote ×8 at temperature 0.7 on Stage-1 wrongs	8×	83.84 % (166 / 198)
Stage 3	Adaptive ensemble refinement (close-tie tiebreaker + iterative MTI on residual hard questions)	≈ 20×	🥇 88.89 % (176 / 198)

Key performance indicators:

Stage 1 → Stage 3: +14.14 %p through adaptive protocol
vs Darwin-27B-Opus (86.9 %): +1.99 %p
vs Darwin-36B-Opus (88.4 %): +0.49 %p
vs Darwin-31B-Opus (85.9 %): +2.99 %p

🚀 Usage

Standard inference (Stage 1 baseline)

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tok = AutoTokenizer.from_pretrained(
    "FINAL-Bench/Darwin-28B-Opus",
    trust_remote_code=True,
)
model = AutoModelForCausalLM.from_pretrained(
    "FINAL-Bench/Darwin-28B-Opus",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

messages = [
    {"role": "user",
     "content": "Solve: If f(x) = x³ − 3x + 2, find all critical points and classify them."}
]
text = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tok(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=2048, do_sample=False)
print(tok.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True))

Enhanced accuracy (Stage 2-3 adaptive)

For leaderboard-grade accuracy, combine:

Stage 1 greedy baseline,
Stage 2 maj@8 temperature sampling on low-confidence answers,
Stage 3 adaptive refinement on still-disputed answers.

Reference implementation is provided in the Darwin-series evaluation harness.

🎯 Recommended Use-Cases

Graduate-level STEM reasoning (GPQA / science qualifying exams)
Mathematical problem solving (MATH, AIME-style problems)
Code generation and debugging (HumanEval, MBPP)
Complex multi-step chain-of-thought tasks
Bilingual reasoning (strong English + Korean; also Chinese / Japanese)

⚠️ Limitations

At 27.6 B parameters in bfloat16, full inference requires ≈ 55 GB of VRAM (e.g., a single A100-80GB or B200).
Optimised for English first, with secondary support for Korean, Chinese, and Japanese.
Deep Opus-style reasoning traces tend to be verbose — control with max_new_tokens as needed.

📚 Citation

@misc{darwin28b_opus_2026,
  title  = {Darwin-28B-Opus: Evolutionary Merging of Qwen3.6-27B with Claude-Opus-Distilled Reasoning},
  author = {FINAL-Bench / Darwin Research Team},
  year   = {2026},
  howpublished = {\url{https://huggingface.co/FINAL-Bench/Darwin-28B-Opus}},
  note   = {Darwin V7 · Mother-centric Ratio Interpolation merge · 88.89 % GPQA Diamond (3-stage)}
}

🔗 Related Darwin Models

Darwin-36B-Opus — MoE 36B, Qwen3.6-35B-A3B × Opus distilled, GPQA 88.4 %
Darwin-31B-Opus — 31B dense, multilingual-strong reasoning, GPQA 85.9 %
Darwin-27B-Opus — 27B dense (Qwen3.5 generation), GPQA 86.9 %
Darwin-9B-NEG — 9B with Native Entropy Gating, GPQA 84.3 %
Darwin-9B-Opus — the Qwen3.5-9B Darwin member
Darwin-4B-Genesis — smallest Darwin member

Darwin V7 · Qwen3.6 generation flagship · Sealed 2026-04-25 · FINAL-Bench

Downloads last month: 1,506

GGUF

Model size

27B params

Architecture

qwen35

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Model tree for WhoTookMyAmogusNickname/FINAL-Bench_Darwin-28B-Opus-GGUF

Base model

FINAL-Bench/Darwin-28B-Opus

Quantized

(3)

this model

Evaluation results

Accuracy on GPQA Diamond
self-reported

88.890