Instructions to use FINAL-Bench/Darwin-36B-Opus with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use FINAL-Bench/Darwin-36B-Opus with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="FINAL-Bench/Darwin-36B-Opus")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("FINAL-Bench/Darwin-36B-Opus")
model = AutoModelForCausalLM.from_pretrained("FINAL-Bench/Darwin-36B-Opus")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use FINAL-Bench/Darwin-36B-Opus with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "FINAL-Bench/Darwin-36B-Opus"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FINAL-Bench/Darwin-36B-Opus",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/FINAL-Bench/Darwin-36B-Opus

SGLang

How to use FINAL-Bench/Darwin-36B-Opus with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "FINAL-Bench/Darwin-36B-Opus" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FINAL-Bench/Darwin-36B-Opus",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "FINAL-Bench/Darwin-36B-Opus" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FINAL-Bench/Darwin-36B-Opus",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use FINAL-Bench/Darwin-36B-Opus with Docker Model Runner:
```
docker model run hf.co/FINAL-Bench/Darwin-36B-Opus
```

SeaWolf-AI commited on 24 days ago

Commit

f119914

verified ·

1 Parent(s): e56adcf

Add model card with parent lineage (Qwen3.6 + hesamation Opus)

Browse files

Files changed (1) hide show

README.md +145 -0

README.md ADDED Viewed

	@@ -0,0 +1,145 @@

+---
+license: apache-2.0
+language:
+  - en
+  - ko
+  - multilingual
+base_model:
+  - Qwen/Qwen3.6-35B-A3B
+  - hesamation/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled
+tags:
+  - darwin-v7
+  - evolutionary-merge
+  - mri-guided
+  - slerp
+  - qwen3.6
+  - moe
+  - a3b
+  - reasoning
+  - thinking
+  - opus-series
+  - hybrid-vigor
+library_name: transformers
+pipeline_tag: text-generation
+---
+# Darwin-36B-Opus
+**Darwin Opus 시리즈 — Qwen3.6 세대 (A3B MoE)**
+Qwen3.6-35B-A3B 기반 진화적 병합 모델. Father(순정 base) × Mother(Claude Opus 4.6 Reasoning Distilled)를 Darwin V7 엔진의 **MRI 처방 + CMA-ES 진화 + SLERP** 기법으로 자동 병합.
+## 🧬 계보 (Parents)
+### 🔵 Father — Base Stability
+- **[Qwen/Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B)**
+  - 35B MoE (3B active), 40 layers
+  - Hybrid attention: **Gated DeltaNet 75% + Gated Attention 25%**
+  - GPQA 86.0% / MMLU-Pro 85.2% / AIME26 92.7% (official)
+### 🔴 Mother — Reasoning Distillation
+- **[hesamation/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled](https://huggingface.co/hesamation/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled)**
+  - Father에 Claude Opus 4.6 CoT 증류 SFT
+  - LoRA rank=32, 2 epochs, 762 steps, 14,233 CoT 샘플
+  - MMLU-Pro (70 limit-5): **75.71%** (+32.85%p vs Father base)
+  - qwen3-thinking 템플릿, response-only masking
+## 🔬 Darwin V7 교배 방식
+```
+Phase 0: Auto-Profile (아키텍처 호환 검사)        → COMPATIBLE ✓
+Phase 1: MRI Scan (텐서별 norm/entropy/std + probe)
+Phase 2a: CMA-ES Evolution (500 steps, 8-block genome)
+          → proxy score 0.8403
+Phase 2b: Real SLERP Merge (top-5 candidate evaluation)
+          → method=SLERP  ratio=0.416  mri_trust=0.783
+Phase 3: Health check (perplexity + smoke gen)     → healthy ✓
+Phase 4: Upload
+```
+### 병합 공식
+```
+각 텐서별 최종 비율:
+  final_ratio = mri_ratio × 0.783 + genome_ratio × 0.217
+- 0.416 = global blend ratio (Mother 41.6% + Father 58.4%)
+- 0.783 = MRI 처방 신뢰도 (norm/entropy 기반 처방 비중)
+- 8 블록 × 40 레이어 genome 진화 최적화
+```
+### 왜 SLERP?
+두 모델 가중치는 고차원 곡면 위의 벡터. 선형 보간(linear avg)은 매니폴드를 벗어나 무의미한 위치로 이동하지만, **구면선형보간(SLERP)**은 곡면을 따라 부드럽게 이동하여 양쪽 특성을 보존.
+## 🏷️ 시리즈 포지셔닝
+| Darwin Opus 모델 | Father | Mother | GPQA |
+|-----------------|--------|--------|:----:|
+| Darwin-27B-Opus | Qwen3.5-27B | Jackrong Claude-4.6-Opus distilled | 86.9 |
+| Darwin-31B-Opus | Gemma2-27B × 다양 | Opus variants | 85.9 |
+| Darwin-35B-A3B-Opus | Qwen3.5-35B-A3B | Jackrong Opus distilled | (측정중) |
+| **Darwin-36B-Opus** | **Qwen3.6-35B-A3B** | **hesamation Qwen3.6 Opus distilled** | **(측정중)** |
+`36B`는 **Qwen3.6 세대** 표시로 naming에 반영 (파라미터는 실제 36.0B).
+## 🧠 아키텍처
+- **Architecture**: Qwen3.5MoE (Qwen3.6는 3.5 코드베이스 재활용)
+- **Total params**: 36.0B
+- **Active params**: ~3B (MoE sparse)
+- **Layers**: 40
+- **Hidden size**: 2048
+- **Experts**: 256 routed, top-8 activation
+- **Hybrid attention**: 75% Gated DeltaNet + 25% Gated Attention
+- **Chat template**: `<|im_start|>assistant\n<think>\n` (thinking mode default)
+## 💡 사용법
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+import torch
+tok = AutoTokenizer.from_pretrained("FINAL-Bench/Darwin-36B-Opus", trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained(
+    "FINAL-Bench/Darwin-36B-Opus",
+    torch_dtype=torch.bfloat16,
+    device_map="auto",
+    trust_remote_code=True,
+)
+messages = [{"role": "user", "content": "What is the derivative of sin(x²)?"}]
+text = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = tok(text, return_tensors="pt").to(model.device)
+outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.6, do_sample=True)
+print(tok.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))
+```
+## ⚠️ 평가 시 주의
+Reasoning 모델(`<think></think>` 사용)이므로 답 추출 시:
+```python
+# 모델 응답에서 </think> 이후 부분만 정답으로 추출
+idx = response.rfind("</think>")
+answer_part = response[idx + len("</think>"):].strip() if idx >= 0 else response
+```
+## 🏗️ 제작
+- **Engine**: Darwin V7 (FINAL-Bench proprietary)
+- **Hardware**: 2× NVIDIA B200 (merge GPUs)
+- **Evolution**: 500 steps in ~15 minutes
+- **Cache ID**: `merged_6edaacaf`
+- **Proxy fitness (arc_challenge)**: 0.8403
+- **Commit**: `e56adcfb` (2026-04-22)
+## 📜 라이선스
+Apache 2.0 (Qwen3.6 라이선스 승계)
+## 🙏 Credits
+- Qwen Team (Father base)
+- @hesamation (Mother: Opus distillation)
+- Anthropic Claude Opus 4.6 (Teacher)
+- FINAL-Bench / VIDRAFT_LAB (Darwin V7 engine + breeding)