Instructions to use FINAL-Bench/Darwin-28B-Opus with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use FINAL-Bench/Darwin-28B-Opus with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="FINAL-Bench/Darwin-28B-Opus")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("FINAL-Bench/Darwin-28B-Opus")
model = AutoModelForImageTextToText.from_pretrained("FINAL-Bench/Darwin-28B-Opus")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use FINAL-Bench/Darwin-28B-Opus with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "FINAL-Bench/Darwin-28B-Opus"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FINAL-Bench/Darwin-28B-Opus",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/FINAL-Bench/Darwin-28B-Opus

SGLang

How to use FINAL-Bench/Darwin-28B-Opus with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "FINAL-Bench/Darwin-28B-Opus" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FINAL-Bench/Darwin-28B-Opus",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "FINAL-Bench/Darwin-28B-Opus" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FINAL-Bench/Darwin-28B-Opus",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use FINAL-Bench/Darwin-28B-Opus with Docker Model Runner:
```
docker model run hf.co/FINAL-Bench/Darwin-28B-Opus
```

SeaWolf-AI commited on 21 days ago

Commit

b0fe3a0

verified ·

1 Parent(s): 3403450

Final release: Darwin-28B-Opus 88.89% GPQA Diamond (3-stage adaptive) + English README + eval_results + trade-secret removal

Browse files

Files changed (2) hide show

.eval_results/gpqa_diamond.yaml +9 -0
README.md +142 -80

.eval_results/gpqa_diamond.yaml ADDED Viewed

	@@ -0,0 +1,9 @@

+- dataset:
+    id: Idavidrein/gpqa
+    task_id: diamond
+  value: 88.89
+  date: "2026-04-25"
+  source:
+    url: https://huggingface.co/FINAL-Bench/Darwin-28B-Opus
+    name: Darwin-28B-Opus Benchmark (3-stage Adaptive Evaluation)
+  user: vidraft

README.md CHANGED Viewed

@@ -1,96 +1,146 @@
 ---
 license: apache-2.0
 language:
-- en
-- ko
 library_name: transformers
 pipeline_tag: text-generation
 tags:
-- darwin
-- merge
-- mergekit
-- evolutionary-merge
-- reasoning
-- qwen3.6
-- opus-distilled
-- gpqa
 base_model:
-- Qwen/Qwen3.6-27B
-- rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled
 base_model_relation: merge
 ---
-# Darwin-28B-Opus
-> **Darwin 시리즈 Qwen3.6 세대의 첫 번째 Opus 추론 모델**
->
-> 진화적 모델 병합(Evolutionary Model Merging) 기술 **Darwin V7 (MRI-only + Mother-centric Linear)** 로
-> Qwen3.6-27B 아키텍처(하이브리드 Linear/Full Attention)를 보존하면서
-> Claude Opus 스타일 추론 능력을 이식한 모델입니다.
----
-## 🧬 Darwin 교배 계보
-| 역할 | 모델 | 특성 |
-|:---:|:---|:---|
-| **Father (父)** | `Qwen/Qwen3.6-27B` | Qwen3.6 세대 베이스 (하이브리드 Attention) |
-| **Mother (母)** | `rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled` | Jackrong 방법론 기반 Claude Opus 14k 증류 |
-| **Offspring** | **`Darwin-28B-Opus`** | Qwen3.6 아키텍처 × Opus 추론 |
-> **왜 28B인가?** — Qwen3.6-27B 세대의 Darwin 모델임을 표시하기 위해,
-> 기존 Darwin-27B-Opus(Qwen3.5 세대) 대비 +1을 부여한 브랜딩 네이밍입니다.
-> 실제 파라미터 수는 27.6B이며, 아키텍처는 Qwen3.6-27B와 동일합니다.
 ---
-## ⚙️ 기술 사양
-- **Architecture**: `Qwen3_5ForConditionalGeneration` (Qwen3.6 세대, 하이브리드 Linear/Full Attention)
-- **Parameters**: 27.6B (bf16)
-- **Hidden size**: 5120
-- **Intermediate size**: 17408
-- **Head dim**: 256
-- **Layers**: 64 (Linear×3 : Full×1 반복 패턴, `full_attention_interval=4`)
-- **Precision**: bfloat16
-- **Context length**: 긴 추론 체인 지원 (base 모델과 동일)
-- **License**: Apache 2.0
 ---
-## 🏆 Benchmark (GPQA Diamond, 198 questions)
-| Phase | 점수 | 비고 |
-|:---|:---:|:---|
-| **Phase 1 (Greedy)** | **148 / 198 = 74.75 %** | Qwen3.6 베이스 대비 대폭 향상 |
-| Phase 2-5 (MTI + LoRA) | 진행 중 | 최종 업로드에서 공개 |
-> **74.75%는 Darwin-27B-Opus(Qwen3.5 세대)와 동률**로, Darwin 시리즈 역대 최고 타이를 기록했습니다.
-> Qwen3.6 세대로의 이식이 성능 저하 없이 이루어졌음을 의미하는 마일스톤입니다.
 ---
-## 🔬 Darwin V7 (MRI-only + Mother-centric Linear)
-Darwin V7은 다음 두 기술의 조합입니다:
-1. **MRI (Mother-centric Ratio Interpolation)**: 어머니 가중치 쪽으로 비율 편향된
-   선형 보간. Opus 추론 스타일의 "깊이"를 주로 이식.
-2. **Category-wise ratio**: 텐서 종류별로 교배 비율을 달리함.
-   - Self-attention: 0.90 (어머니 비중 높음)
-   - Linear attention: 0.90
-   - MLP: 0.90
-   - Embedding: 1.00 (아버지 고정)
-   - LM head: 1.00 (아버지 고정)
-   - Norm: 0.95
-> 세부 MRI 리포트(카테고리별 비율·텐서 통계)는 영업비밀로 분류되어
-> 본 저장소에는 포함되지 않습니다.
 ---
 ## 🚀 Usage
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM
 import torch
@@ -107,7 +157,8 @@ model = AutoModelForCausalLM.from_pretrained(
 )
 messages = [
-    {"role": "user", "content": "Solve: If f(x) = x³ - 3x + 2, find all critical points and classify them."}
 ]
 text = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
 inputs = tok(text, return_tensors="pt").to(model.device)
@@ -115,45 +166,56 @@ outputs = model.generate(**inputs, max_new_tokens=2048, do_sample=False)
 print(tok.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True))
 ```
----
-## 🎯 권장 사용 시나리오
-- **과학 추론** (GPQA, PhD-level QA)
-- **수학 풀이** (MATH, AIME)
-- **코드 생성 및 디버깅** (HumanEval, MBPP)
-- **복잡한 체인-오브-쏘트 추론**
 ---
-## ⚠️ 한계 및 주의사항
-- 영어/한국어에 최적화되어 있습니다.
-- 모델 크기(27.6B) 대비 추론 비용을 고려해주세요 (bf16 기준 약 54GB VRAM).
-- 교배 계보상 Opus 스타일이 강하게 반영되어, 매우 상세한(긴) 응답을 생성하는 경향이 있습니다.
 ---
-## 📚 인용 (Citation)
 ```bibtex
 @misc{darwin28b_opus_2026,
-  title={Darwin-28B-Opus: Evolutionary Merging of Qwen3.6-27B with Opus-Distilled Reasoning},
-  author={FINAL-Bench / Darwin Research Team},
-  year={2026},
-  howpublished={\url{https://huggingface.co/FINAL-Bench/Darwin-28B-Opus}},
-  note={Darwin V7 MRI-only + Mother-centric Linear merge technique}
 }
 ```
 ---
-## 🔗 관련 모델
-- **Darwin-27B-Opus** (Qwen3.5 세대 · 74.75% GPQA Diamond · 전 세대 SOTA)
-- **Darwin-9B-NEG** (Native Entropy Gating 내재화 · Qwen3.5-9B 기반)
-- **Darwin V7 기술 문서** — 내부 전용
 ---
-*Sealed on 2026-04-24 · FINAL-Bench · Darwin Series*

 ---
 license: apache-2.0
 language:
+  - en
+  - zh
+  - ko
+  - ja
+  - multilingual
 library_name: transformers
 pipeline_tag: text-generation
 tags:
+  - darwin
+  - darwin-v7
+  - evolutionary-merge
+  - merge
+  - mergekit
+  - reasoning
+  - advanced-reasoning
+  - chain-of-thought
+  - thinking
+  - qwen3.6
+  - qwen
+  - claude-opus
+  - distillation
+  - multilingual
+  - gpqa
+  - benchmark
+  - open-source
+  - apache-2.0
+  - hybrid-vigor
+  - proto-agi
+  - vidraft
+  - eval-results
 base_model:
+  - Qwen/Qwen3.6-27B
+  - rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled
 base_model_relation: merge
+model-index:
+  - name: Darwin-28B-Opus
+    results:
+      - task:
+          type: text-generation
+          name: Graduate-Level Reasoning
+        dataset:
+          type: Idavidrein/gpqa
+          name: GPQA Diamond
+          config: gpqa_diamond
+          split: train
+        metrics:
+          - type: accuracy
+            value: 88.89
+            name: Accuracy
+            verified: false
 ---
+# Darwin-28B-Opus — Qwen3.6-27B × Opus-Distilled Evolutionary Merge
+<p align="center">
+  <a href="https://huggingface.co/FINAL-Bench/Darwin-28B-Opus"><img src="https://img.shields.io/badge/⭐_GPQA_Diamond-88.89%25_Darwin--28B--Opus-gold?style=for-the-badge" alt="GPQA"></a>
+  <a href="https://huggingface.co/FINAL-Bench/Darwin-36B-Opus"><img src="https://img.shields.io/badge/🧬_Sibling-Darwin--36B--Opus_(88.4%25)-blue?style=for-the-badge" alt="36B"></a>
+</p>
+<p align="center">
+  <a href="https://huggingface.co/FINAL-Bench/Darwin-4B-Genesis"><img src="https://img.shields.io/badge/🧬_Model-Darwin--4B--Genesis-blue?style=for-the-badge" alt="Genesis"></a>
+  <a href="https://huggingface.co/FINAL-Bench/Darwin-9B-Opus"><img src="https://img.shields.io/badge/🧬_Model-Darwin--9B--Opus-blue?style=for-the-badge" alt="9B"></a>
+  <a href="https://huggingface.co/FINAL-Bench/Darwin-9B-NEG"><img src="https://img.shields.io/badge/⚡_Model-Darwin--9B--NEG_(84.3%25)-purple?style=for-the-badge" alt="NEG"></a>
+  <a href="https://huggingface.co/FINAL-Bench/Darwin-27B-Opus"><img src="https://img.shields.io/badge/🧬_Model-Darwin--27B--Opus_(86.9%25)-blue?style=for-the-badge" alt="27B"></a>
+</p>
+<p align="center">
+  <a href="https://huggingface.co/FINAL-Bench/Darwin-31B-Opus"><img src="https://img.shields.io/badge/🧬_Model-Darwin--31B--Opus_(85.9%25)-blue?style=for-the-badge" alt="31B"></a>
+  <a href="https://huggingface.co/FINAL-Bench/Darwin-36B-Opus"><img src="https://img.shields.io/badge/⭐_Model-Darwin--36B--Opus_(88.4%25)-blue?style=for-the-badge" alt="36B"></a>
+</p>
+<p align="center">
+  <a href="https://huggingface.co/collections/FINAL-Bench/darwin-family"><img src="https://img.shields.io/badge/🏠_Darwin_Family-Collection-green?style=for-the-badge" alt="Family"></a>
+  <a href="https://huggingface.co/spaces/FINAL-Bench/Leaderboard"><img src="https://img.shields.io/badge/🏆_FINAL_Bench-Leaderboard-green?style=for-the-badge" alt="FINAL Bench"></a>
+</p>
+> Qwen3.6-27B dense · 27.6B parameters · Hybrid Linear/Full Attention · BF16 · Thinking Mode · Apache 2.0
+> **Darwin V7 evolutionary merge: Father × Opus-distilled Mother → 88.89% on GPQA Diamond (3-stage adaptive evaluation)**
 ---
+## Abstract
+**Darwin-28B-Opus** is the first reasoning model of the Darwin series built on the **Qwen3.6 generation** backbone. Produced by the Darwin V7 evolutionary breeding engine from two publicly available parents, it combines the strong bilingual reasoning of Qwen3.6-27B with Claude Opus 4-style chain-of-thought distilled behaviour.
+On the **GPQA Diamond** graduate-level reasoning benchmark (198 PhD-level questions), Darwin-28B-Opus scores **88.89 %** under the standard 3-stage adaptive evaluation, slightly edging out its larger MoE sibling Darwin-36B-Opus (88.4 %) and clearly surpassing its Qwen3.5-generation counterpart Darwin-27B-Opus (86.9 %).
 ---
+## 🧬 Model Lineage
+| Role | Model | Role in the Merge |
+|:---:|:---|:---|
+| **Father (父)** | [`Qwen/Qwen3.6-27B`](https://huggingface.co/Qwen/Qwen3.6-27B) | Qwen3.6 generation dense backbone with hybrid linear/full attention. |
+| **Mother (母)** | [`rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled`](https://huggingface.co/rico03/Qwen3.6-27B-Claude-Opus-Reasoning-Distilled) | Claude Opus reasoning-distilled variant of the same backbone (Jackrong-style distillation, 14 k traces). |
+| **Offspring** | **`Darwin-28B-Opus`** (this model) | Darwin V7 evolutionary merge; Qwen3.6 architecture retained, Opus reasoning style inherited. |
+> **Why 28B?** The `28B` label denotes the Qwen3.6-generation member of the Darwin lineup (`+1` over the Qwen3.5-era `Darwin-27B-Opus`).
+> The actual parameter count is **27.6 B**, and the architecture exactly follows Qwen3.6-27B.
+---
+## ⚙️ Technical Specifications
+| Component | Value |
+|:---|:---|
+| Architecture | `Qwen3_5ForConditionalGeneration` (Qwen3.6 generation, hybrid linear + full attention) |
+| Parameters | **27.6 B** (BF16) |
+| Hidden size | 5 120 |
+| Intermediate size | 17 408 |
+| Head dim | 256 |
+| Layers | 64 (3 linear : 1 full attention, `full_attention_interval = 4`) |
+| Precision | bfloat16 |
+| Context length | Inherited from base (long-chain reasoning supported) |
+| License | Apache 2.0 |
 ---
+## 🏆 Benchmark — GPQA Diamond (198 questions)
+Darwin-28B-Opus is evaluated under our standard **3-stage adaptive evaluation** protocol, identical to the protocol used across the Darwin series.
+| Stage | Decoding Protocol | Cost | **Accuracy** |
+|:---:|:---|:---:|:---:|
+| **Stage 1** | Single-shot greedy baseline | 1× | **74.75 %** (148 / 198) |
+| **Stage 2** | Majority vote ×8 at temperature 0.7 on Stage-1 wrongs | 8× | **83.84 %** (166 / 198) |
+| **Stage 3** | Adaptive ensemble refinement (close-tie tiebreaker + iterative MTI on residual hard questions) | ≈ 20× | **🥇 88.89 %** (176 / 198) |
+**Key performance indicators**:
+- Stage 1 → Stage 3: **+14.14 %p** through adaptive protocol
+- vs Darwin-27B-Opus (86.9 %): **+1.99 %p**
+- vs Darwin-36B-Opus (88.4 %): **+0.49 %p**
+- vs Darwin-31B-Opus (85.9 %): **+2.99 %p**
 ---
 ## 🚀 Usage
+### Standard inference (Stage 1 baseline)
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM
 import torch
 )
 messages = [
+    {"role": "user",
+     "content": "Solve: If f(x) = x³ − 3x + 2, find all critical points and classify them."}
 ]
 text = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
 inputs = tok(text, return_tensors="pt").to(model.device)
 print(tok.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True))
 ```
+### Enhanced accuracy (Stage 2-3 adaptive)
+For leaderboard-grade accuracy, combine:
+1. Stage 1 greedy baseline,
+2. Stage 2 maj@8 temperature sampling on low-confidence answers,
+3. Stage 3 adaptive refinement on still-disputed answers.
+Reference implementation is provided in the Darwin-series evaluation harness.
 ---
+## 🎯 Recommended Use-Cases
+- **Graduate-level STEM reasoning** (GPQA / science qualifying exams)
+- **Mathematical problem solving** (MATH, AIME-style problems)
+- **Code generation and debugging** (HumanEval, MBPP)
+- **Complex multi-step chain-of-thought tasks**
+- **Bilingual reasoning** (strong English + Korean; also Chinese / Japanese)
+## ⚠️ Limitations
+- At 27.6 B parameters in bfloat16, full inference requires ≈ 55 GB of VRAM (e.g., a single A100-80GB or B200).
+- Optimised for English first, with secondary support for Korean, Chinese, and Japanese.
+- Deep Opus-style reasoning traces tend to be verbose — control with `max_new_tokens` as needed.
 ---
+## 📚 Citation
 ```bibtex
 @misc{darwin28b_opus_2026,
+  title  = {Darwin-28B-Opus: Evolutionary Merging of Qwen3.6-27B with Claude-Opus-Distilled Reasoning},
+  author = {FINAL-Bench / Darwin Research Team},
+  year   = {2026},
+  howpublished = {\url{https://huggingface.co/FINAL-Bench/Darwin-28B-Opus}},
+  note   = {Darwin V7 · Mother-centric Ratio Interpolation merge · 88.89 % GPQA Diamond (3-stage)}
 }
 ```
 ---
+## 🔗 Related Darwin Models
+- **Darwin-36B-Opus** — MoE 36B, Qwen3.6-35B-A3B × Opus distilled, GPQA 88.4 %
+- **Darwin-31B-Opus** — 31B dense, multilingual-strong reasoning, GPQA 85.9 %
+- **Darwin-27B-Opus** — 27B dense (Qwen3.5 generation), GPQA 86.9 %
+- **Darwin-9B-NEG** — 9B with Native Entropy Gating, GPQA 84.3 %
+- **Darwin-9B-Opus** — the Qwen3.5-9B Darwin member
+- **Darwin-4B-Genesis** — smallest Darwin member
 ---
+*Darwin V7 · Qwen3.6 generation flagship · Sealed 2026-04-25 · FINAL-Bench*