Instructions to use FINAL-Bench/Darwin-9B-NEG with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use FINAL-Bench/Darwin-9B-NEG with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="FINAL-Bench/Darwin-9B-NEG")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("FINAL-Bench/Darwin-9B-NEG")
model = AutoModelForImageTextToText.from_pretrained("FINAL-Bench/Darwin-9B-NEG")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use FINAL-Bench/Darwin-9B-NEG with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "FINAL-Bench/Darwin-9B-NEG"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FINAL-Bench/Darwin-9B-NEG",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/FINAL-Bench/Darwin-9B-NEG

SGLang

How to use FINAL-Bench/Darwin-9B-NEG with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "FINAL-Bench/Darwin-9B-NEG" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FINAL-Bench/Darwin-9B-NEG",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "FINAL-Bench/Darwin-9B-NEG" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "FINAL-Bench/Darwin-9B-NEG",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use FINAL-Bench/Darwin-9B-NEG with Docker Model Runner:
```
docker model run hf.co/FINAL-Bench/Darwin-9B-NEG
```

Darwin-9B-NEG / README.md

SeaWolf-AI

Update README.md

faf8e83 verified about 17 hours ago

preview code

raw

history blame contribute delete

11.1 kB

	---
	license: apache-2.0
	base_model:
	- FINAL-Bench/Darwin-9B-Opus
	tags:
	- darwin
	- darwin-v8
	- darwin-neg
	- native-entropy-gating
	- NEG
	- reasoning
	- self-regulated-reasoning
	- advanced-reasoning
	- thinking
	- qwen3.5
	- qwen
	- gpqa
	- benchmark
	- open-source
	- apache-2.0
	- hybrid-vigor
	- proto-agi
	- vidraft
	- eval-results
	language:
	- en
	- zh
	- ko
	- ja
	- multilingual
	pipeline_tag: text-generation
	library_name: transformers
	model-index:
	- name: Darwin-9B-NEG
	results:
	- task:
	type: text-generation
	name: Graduate-Level Reasoning
	dataset:
	type: Idavidrein/gpqa
	name: GPQA Diamond
	config: gpqa_diamond
	split: train
	metrics:
	- type: accuracy
	value: 84.34
	name: Accuracy
	verified: false
	---

	# Darwin-9B-NEG — The First Native Entropy Gating Model

	<p align="center">
	<a href="https://huggingface.co/FINAL-Bench/Darwin-9B-NEG"><img src="https://img.shields.io/badge/⭐_GPQA_Diamond-84.34%25_Darwin--9B--NEG-gold?style=for-the-badge" alt="GPQA"></a>
	<a href="https://huggingface.co/FINAL-Bench/Darwin-9B-Opus"><img src="https://img.shields.io/badge/🧬_Base-Darwin--9B--Opus-blue?style=for-the-badge" alt="Base"></a>
	</p>

	<p align="center">
	<a href="https://huggingface.co/FINAL-Bench/Darwin-4B-Genesis"><img src="https://img.shields.io/badge/🧬_Model-Darwin--4B--Genesis-blue?style=for-the-badge" alt="Genesis"></a>
	<a href="https://huggingface.co/FINAL-Bench/Darwin-9B-Opus"><img src="https://img.shields.io/badge/🧬_Model-Darwin--9B--Opus-blue?style=for-the-badge" alt="9B"></a>
	<a href="https://huggingface.co/FINAL-Bench/Darwin-27B-Opus"><img src="https://img.shields.io/badge/🧬_Model-Darwin--27B--Opus-blue?style=for-the-badge" alt="27B"></a>
	<a href="https://huggingface.co/FINAL-Bench/Darwin-31B-Opus"><img src="https://img.shields.io/badge/🧬_Model-Darwin--31B--Opus-blue?style=for-the-badge" alt="31B"></a>
	<a href="https://huggingface.co/FINAL-Bench/Darwin-36B-Opus"><img src="https://img.shields.io/badge/🧬_Model-Darwin--36B--Opus-blue?style=for-the-badge" alt="36B"></a>
	</p>

	<p align="center">
	<a href="https://huggingface.co/collections/FINAL-Bench/darwin-family"><img src="https://img.shields.io/badge/🏠_Darwin_Family-Collection-green?style=for-the-badge" alt="Family"></a>
	<a href="https://huggingface.co/spaces/FINAL-Bench/Leaderboard"><img src="https://img.shields.io/badge/🏆_FINAL_Bench-Leaderboard-green?style=for-the-badge" alt="FINAL Bench"></a>
	</p>

	> Qwen3.5-9B backbone · 8.95B parameters · BF16 · Thinking Mode · Apache 2.0
	> The first NEG-enabled model — self-regulating reasoning with no extra library.

	---

	## Abstract

	Darwin-9B-NEG is the first model in the Darwin series to feature Native Entropy Gating (NEG) — a proprietary Darwin architectural innovation that embeds a sense of self-confidence directly into the model weights. Unlike external multi-turn iteration (MTI) techniques that require 3×–8× extra inference, NEG operates inside the single decoding loop and activates in fewer than 5 % of generation steps, lifting reasoning accuracy by more than 12 percentage points at 1× inference cost.

	On the GPQA Diamond PhD-level reasoning benchmark (198 questions), Darwin-9B-NEG scores 84.34 % with the full 3-stage ensemble protocol — surpassing even the published Qwen3.5-9B leaderboard result (81.7 %).

	---

	## What Makes Darwin-9B-NEG Different

	### 🧬 Darwin Series — Evolutionary Model Merging
	The Darwin family is produced by Darwin V7, an evolutionary breeding engine that recombines two parent LLMs into a single descendant, preserving hybrid vigour across reasoning and knowledge capabilities. Darwin-9B-Opus — this model's base — is the Qwen3.5-family member of the Darwin series, previously published as a stand-alone reasoning model.

	### ⚡ NEG — Native Entropy Gating (Darwin V8)
	NEG is a proprietary Darwin technology that gives the language model an architecturally-internalised self-confidence sense. Two tiny learnable modules ride alongside the transformer:

	- NEG-Head (≈ 4 M params, ~ 0.05 % of total weights) predicts, at each step, the entropy of the next-token distribution from the last hidden state.
	- NEG-Gate (1 learnable threshold) decides, on a per-token basis, whether the model is "confident enough" to commit to its top choice, or whether it should restrict its choice to a narrow top-k subset.

	Because NEG is carried inside the model weights themselves, there is nothing extra to ship or to install: standard `transformers` loading with `trust_remote_code=True` attaches the modules automatically. The model file is the feature.

	Why it matters
	- 1× inference cost — no multi-sample voting, no multi-turn loops
	- < 5 % gate activation — negligible latency overhead versus the base model
	- +12.63 %p on GPQA Diamond vs. the NEG-free Darwin-9B-Opus baseline (same greedy decoding, same prompt, same tokens)
	- Single-file deployment — drop in to vLLM / SGLang / TGI / `transformers`, no new engine required
	- No trade-secret leaks — the merge recipe is kept internal; only the final model weights are released under Apache 2.0

	---

	## 🏗️ Architecture Overview

	```
	Input Text
	↓
	[Darwin-9B-Opus backbone (frozen during NEG training)]
	↓
	Transformer Layers × 32
	↓
	last hidden state ──┐
	│ │
	▼ ▼
	LM Head NEG-Head
	│ │
	base logits predicted entropy
	│ │
	└──▶ NEG-Gate ◀─┘
	│
	▼
	guided logits
	│
	▼
	next token
	```

	### Key Specifications

	\| Component \| Value \|
	\|:---\|:---\|
	\| Architecture \| Qwen3.5 decoder-only transformer (32 layers, hidden 4096) \|
	\| Total parameters \| 8.95 B (base) + ≈ 4 M (NEG modules) \|
	\| NEG-Head \| 2-layer MLP with softplus output \|
	\| NEG-Gate \| top-k masking gate with learnable entropy threshold \|
	\| Precision \| bfloat16 \|
	\| Context length \| inherited from Darwin-9B-Opus \|
	\| License \| Apache 2.0 \|

	---

	## 🏆 Benchmark Results — GPQA Diamond (198 PhD-level questions)

	Darwin-9B-NEG ships three decoding modes from the same model weights, allowing users to trade inference cost for accuracy:

	\| Mode \| Decoding Protocol \| Inference Cost \| Accuracy \|
	\|:---:\|:---\|:---:\|:---:\|
	\| 0 · Baseline \| Darwin-9B-Opus greedy (NEG disabled) \| 1× \| 51.01 % \|
	\| 1 · Pure NEG \| greedy decoding with NEG enabled \| 1× \| 63.64 % \|
	\| 2 · Permutation \| NEG + choice-order permutation (4 orderings, majority) \| 4× \| 76.26 % \|
	\| 3 · Ensemble Refinement \| NEG + permutation + temperature-sampled ensemble \| ≈ 20× \| 🥇 84.34 % \|

	Improvements:
	- Pure NEG (mode 1) vs. baseline: +12.63 %p at identical inference cost
	- Ensemble (mode 3) vs. baseline: +33.33 %p
	- Ensemble vs. Qwen3.5-9B leaderboard score (81.7 %): +2.64 %p

	> Gate activation rate: 4.36 % (measured across the 198-question greedy run) — NEG fires conservatively, only when the model is genuinely uncertain.

	---

	## 🚀 Usage

	### Quick start — Pure NEG greedy (mode 1, sales default)

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	import torch

	tok = AutoTokenizer.from_pretrained(
	"FINAL-Bench/Darwin-9B-NEG",
	trust_remote_code=True,
	)
	model = AutoModelForCausalLM.from_pretrained(
	"FINAL-Bench/Darwin-9B-NEG",
	torch_dtype=torch.bfloat16,
	device_map="auto",
	trust_remote_code=True,
	)

	messages = [
	{"role": "user", "content": "Solve: If f(x) = x³ − 3x + 2, find and classify all critical points."}
	]
	text = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	inputs = tok(text, return_tensors="pt").to(model.device)
	outputs = model.generate(**inputs, max_new_tokens=2048, do_sample=False)
	print(tok.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True))
	```

	### Using the bundled NEG loader helper

	`modeling_darwin_neg.py` is shipped inside the repo and provides a convenience loader:

	```python
	from modeling_darwin_neg import load_darwin_neg

	model = load_darwin_neg(
	"FINAL-Bench/Darwin-9B-NEG",
	hf_token="hf_xxx",
	)
	```

	### Mode selection

	- Mode 1 (Pure NEG): default `do_sample=False`, NEG is always on.
	- Mode 2 (Permutation): shuffle the option order 4 times, greedy each, majority-vote.
	- Mode 3 (Ensemble): production protocol combining permutation, temperature sampling and second-opinion re-query (internal; reproduction scripts are released separately).

	---

	## 🧬 Model Lineage

	```
	Qwen/Qwen3.5-9B + (Opus-distilled sibling)
	╲ ╱
	Darwin V7 evolutionary merge
	▼
	Darwin-9B-Opus ── stand-alone reasoning model (Apache 2.0)
	▼
	NEG-Head / NEG-Gate training (Darwin V8)
	▼
	Darwin-9B-NEG ── THIS MODEL
	```

	- Base: [FINAL-Bench/Darwin-9B-Opus](https://huggingface.co/FINAL-Bench/Darwin-9B-Opus) (weights frozen during NEG training)
	- Technology generation: Darwin V8 (Native Entropy Gating) — successor to Darwin V7 (evolutionary merging)

	---

	## 🎯 Recommended Use-Cases

	- Graduate-level STEM reasoning — physics, chemistry, biology, mathematics (GPQA-style)
	- Mathematical problem solving (MATH, AIME-style)
	- Code reasoning and debugging (HumanEval-style)
	- Complex chain-of-thought tasks where a small reasoning model with a big boost is desired

	## ⚠️ Limitations

	- Optimised for English first, with secondary support for Korean / Chinese / Japanese.
	- At 8.95 B parameters, knowledge coverage is smaller than the larger Darwin models (27B / 31B / 36B) — for pure world-knowledge tasks consider Darwin-36B-Opus.
	- The Ensemble mode (84.34 %) uses ≈ 20× inference; choose Pure NEG (mode 1) for cost-sensitive deployments.

	---

	## 📚 Citation

	```bibtex
	@misc{darwin9b_neg_2026,
	title = {Darwin-9B-NEG: Native Entropy Gating for Self-Regulated Reasoning at 1x Inference Cost},
	author = {FINAL-Bench / Darwin Research Team},
	year = {2026},
	howpublished = {\url{https://huggingface.co/FINAL-Bench/Darwin-9B-NEG}},
	note = {Darwin V8 — Native Entropy Gating technology generation}
	}
	```

	---

	## 🔗 Related Darwin Models

	- Darwin-36B-Opus — MoE 36B, Qwen3.6-35B-A3B × Opus distilled, GPQA 88.4 %
	- Darwin-31B-Opus — 31B multilingual-strong reasoning
	- Darwin-27B-Opus — 27B dense, GPQA 86.9 %
	- Darwin-28B-Opus — Qwen3.6-27B × rico03 Opus distilled (new 2026-04)
	- Darwin-9B-Opus — this model's base, Qwen3.5-9B family
	- Darwin-4B-Genesis — smallest member, Gemma4 family

	---
	This model is introduced in [Darwin Family](https://arxiv.org/abs/2605.14386).

	Darwin V8 · Sealed 2026-04-24 · FINAL-Bench