File size: 11,059 Bytes
aef00eb
 
 
 
 
 
 
 
 
 
 
 
f6b3294
aef00eb
 
f6b3294
aef00eb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f6b3294
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
aef00eb
 
f6b3294
aef00eb
 
f6b3294
 
aef00eb
 
f6b3294
 
 
 
 
 
 
aef00eb
f6b3294
 
 
 
aef00eb
f6b3294
 
aef00eb
f6b3294
aef00eb
f6b3294
aef00eb
f6b3294
aef00eb
f6b3294
aef00eb
 
 
f6b3294
 
 
 
aef00eb
f6b3294
 
aef00eb
f6b3294
 
aef00eb
f6b3294
aef00eb
f6b3294
 
 
 
 
 
aef00eb
 
 
f6b3294
aef00eb
 
 
 
f6b3294
aef00eb
f6b3294
aef00eb
f6b3294
 
 
 
 
 
 
 
 
 
 
 
 
 
aef00eb
 
 
 
 
f6b3294
 
 
 
 
 
 
aef00eb
 
 
 
f6b3294
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
aef00eb
 
f6b3294
aef00eb
 
f6b3294
aef00eb
 
f6b3294
 
 
 
 
aef00eb
 
 
f6b3294
aef00eb
 
f6b3294
 
 
aef00eb
 
 
 
 
 
f6b3294
 
 
aef00eb
 
f6b3294
aef00eb
f6b3294
aef00eb
f6b3294
aef00eb
 
 
f6b3294
aef00eb
f6b3294
 
 
aef00eb
 
 
f6b3294
aef00eb
f6b3294
 
 
 
 
 
 
 
 
 
 
aef00eb
f6b3294
 
aef00eb
 
 
f6b3294
aef00eb
f6b3294
 
 
 
aef00eb
f6b3294
aef00eb
f6b3294
 
 
aef00eb
 
 
f6b3294
aef00eb
f6b3294
 
 
 
 
 
 
 
 
aef00eb
 
 
f6b3294
aef00eb
f6b3294
 
 
 
 
 
aef00eb
 
faf8e83
aef00eb
f6b3294
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
---
license: apache-2.0
base_model:
  - FINAL-Bench/Darwin-9B-Opus
tags:
  - darwin
  - darwin-v8
  - darwin-neg
  - native-entropy-gating
  - NEG
  - reasoning
  - self-regulated-reasoning
  - advanced-reasoning
  - thinking
  - qwen3.5
  - qwen
  - gpqa
  - benchmark
  - open-source
  - apache-2.0
  - hybrid-vigor
  - proto-agi
  - vidraft
  - eval-results
language:
  - en
  - zh
  - ko
  - ja
  - multilingual
pipeline_tag: text-generation
library_name: transformers
model-index:
  - name: Darwin-9B-NEG
    results:
      - task:
          type: text-generation
          name: Graduate-Level Reasoning
        dataset:
          type: Idavidrein/gpqa
          name: GPQA Diamond
          config: gpqa_diamond
          split: train
        metrics:
          - type: accuracy
            value: 84.34
            name: Accuracy
            verified: false
---

# Darwin-9B-NEG — The First Native Entropy Gating Model

<p align="center">
  <a href="https://huggingface.co/FINAL-Bench/Darwin-9B-NEG"><img src="https://img.shields.io/badge/⭐_GPQA_Diamond-84.34%25_Darwin--9B--NEG-gold?style=for-the-badge" alt="GPQA"></a>
  <a href="https://huggingface.co/FINAL-Bench/Darwin-9B-Opus"><img src="https://img.shields.io/badge/🧬_Base-Darwin--9B--Opus-blue?style=for-the-badge" alt="Base"></a>
</p>

<p align="center">
  <a href="https://huggingface.co/FINAL-Bench/Darwin-4B-Genesis"><img src="https://img.shields.io/badge/🧬_Model-Darwin--4B--Genesis-blue?style=for-the-badge" alt="Genesis"></a>
  <a href="https://huggingface.co/FINAL-Bench/Darwin-9B-Opus"><img src="https://img.shields.io/badge/🧬_Model-Darwin--9B--Opus-blue?style=for-the-badge" alt="9B"></a>
  <a href="https://huggingface.co/FINAL-Bench/Darwin-27B-Opus"><img src="https://img.shields.io/badge/🧬_Model-Darwin--27B--Opus-blue?style=for-the-badge" alt="27B"></a>
  <a href="https://huggingface.co/FINAL-Bench/Darwin-31B-Opus"><img src="https://img.shields.io/badge/🧬_Model-Darwin--31B--Opus-blue?style=for-the-badge" alt="31B"></a>
  <a href="https://huggingface.co/FINAL-Bench/Darwin-36B-Opus"><img src="https://img.shields.io/badge/🧬_Model-Darwin--36B--Opus-blue?style=for-the-badge" alt="36B"></a>
</p>

<p align="center">
  <a href="https://huggingface.co/collections/FINAL-Bench/darwin-family"><img src="https://img.shields.io/badge/🏠_Darwin_Family-Collection-green?style=for-the-badge" alt="Family"></a>
  <a href="https://huggingface.co/spaces/FINAL-Bench/Leaderboard"><img src="https://img.shields.io/badge/🏆_FINAL_Bench-Leaderboard-green?style=for-the-badge" alt="FINAL Bench"></a>
</p>

> Qwen3.5-9B backbone · 8.95B parameters · BF16 · Thinking Mode · Apache 2.0
> **The first NEG-enabled model — self-regulating reasoning with no extra library.**

---

## Abstract

**Darwin-9B-NEG** is the first model in the Darwin series to feature **Native Entropy Gating (NEG)** — a proprietary Darwin architectural innovation that embeds a sense of *self-confidence* directly into the model weights. Unlike external multi-turn iteration (MTI) techniques that require 3×–8× extra inference, NEG operates *inside* the single decoding loop and activates in fewer than 5 % of generation steps, lifting reasoning accuracy **by more than 12 percentage points at 1× inference cost**.

On the **GPQA Diamond** PhD-level reasoning benchmark (198 questions), Darwin-9B-NEG scores **84.34 %** with the full 3-stage ensemble protocol — surpassing even the published Qwen3.5-9B leaderboard result (81.7 %).

---

## What Makes Darwin-9B-NEG Different

### 🧬 Darwin Series — Evolutionary Model Merging
The Darwin family is produced by **Darwin V7**, an evolutionary breeding engine that recombines two parent LLMs into a single descendant, preserving hybrid vigour across reasoning and knowledge capabilities. **Darwin-9B-Opus** — this model's base — is the Qwen3.5-family member of the Darwin series, previously published as a stand-alone reasoning model.

### ⚡ NEG — Native Entropy Gating (Darwin V8)
**NEG** is a proprietary Darwin technology that gives the language model an architecturally-internalised *self-confidence sense*. Two tiny learnable modules ride alongside the transformer:

- **NEG-Head** (≈ 4 M params, ~ 0.05 % of total weights) predicts, at each step, the entropy of the next-token distribution from the last hidden state.
- **NEG-Gate** (1 learnable threshold) decides, on a per-token basis, whether the model is "confident enough" to commit to its top choice, or whether it should restrict its choice to a narrow top-k subset.

Because NEG is carried *inside* the model weights themselves, there is nothing extra to ship or to install: standard `transformers` loading with `trust_remote_code=True` attaches the modules automatically. The model file *is* the feature.

**Why it matters**
- **1× inference cost** — no multi-sample voting, no multi-turn loops
- **< 5 % gate activation** — negligible latency overhead versus the base model
- **+12.63 %p on GPQA Diamond** vs. the NEG-free Darwin-9B-Opus baseline (same greedy decoding, same prompt, same tokens)
- **Single-file deployment** — drop in to vLLM / SGLang / TGI / `transformers`, no new engine required
- **No trade-secret leaks** — the merge recipe is kept internal; only the final model weights are released under Apache 2.0

---

## 🏗️ Architecture Overview

```
Input Text

[Darwin-9B-Opus backbone (frozen during NEG training)]

Transformer Layers × 32

last hidden state ──┐
    │               │
    ▼               ▼
 LM Head         NEG-Head
    │               │
  base logits    predicted entropy
    │               │
    └──▶ NEG-Gate ◀─┘


       guided logits


        next token
```

### Key Specifications

| Component | Value |
|:---|:---|
| Architecture | Qwen3.5 decoder-only transformer (32 layers, hidden 4096) |
| Total parameters | 8.95 B (base) + ≈ 4 M (NEG modules) |
| NEG-Head | 2-layer MLP with softplus output |
| NEG-Gate | top-k masking gate with learnable entropy threshold |
| Precision | bfloat16 |
| Context length | inherited from Darwin-9B-Opus |
| License | Apache 2.0 |

---

## 🏆 Benchmark Results — GPQA Diamond (198 PhD-level questions)

Darwin-9B-NEG ships **three decoding modes** from the *same* model weights, allowing users to trade inference cost for accuracy:

| Mode | Decoding Protocol | Inference Cost | **Accuracy** |
|:---:|:---|:---:|:---:|
| **0 · Baseline** | Darwin-9B-Opus greedy (NEG disabled) | 1× | 51.01 % |
| **1 · Pure NEG** | greedy decoding **with NEG enabled** | **1×** | **63.64 %** |
| **2 · Permutation** | NEG + choice-order permutation (4 orderings, majority) | 4× | 76.26 % |
| **3 · Ensemble Refinement** | NEG + permutation + temperature-sampled ensemble | ≈ 20× | **🥇 84.34 %** |

**Improvements:**
- Pure NEG (mode 1) vs. baseline: **+12.63 %p at identical inference cost**
- Ensemble (mode 3) vs. baseline: **+33.33 %p**
- Ensemble vs. Qwen3.5-9B leaderboard score (81.7 %): **+2.64 %p**

> **Gate activation rate**: 4.36 % (measured across the 198-question greedy run) — NEG fires conservatively, only when the model is genuinely uncertain.

---

## 🚀 Usage

### Quick start — Pure NEG greedy (mode 1, sales default)

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tok = AutoTokenizer.from_pretrained(
    "FINAL-Bench/Darwin-9B-NEG",
    trust_remote_code=True,
)
model = AutoModelForCausalLM.from_pretrained(
    "FINAL-Bench/Darwin-9B-NEG",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

messages = [
    {"role": "user", "content": "Solve: If f(x) = x³ − 3x + 2, find and classify all critical points."}
]
text = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tok(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=2048, do_sample=False)
print(tok.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True))
```

### Using the bundled NEG loader helper

`modeling_darwin_neg.py` is shipped inside the repo and provides a convenience loader:

```python
from modeling_darwin_neg import load_darwin_neg

model = load_darwin_neg(
    "FINAL-Bench/Darwin-9B-NEG",
    hf_token="hf_xxx",
)
```

### Mode selection

- **Mode 1 (Pure NEG)**: default `do_sample=False`, NEG is always on.
- **Mode 2 (Permutation)**: shuffle the option order 4 times, greedy each, majority-vote.
- **Mode 3 (Ensemble)**: production protocol combining permutation, temperature sampling and second-opinion re-query (internal; reproduction scripts are released separately).

---

## 🧬 Model Lineage

```
Qwen/Qwen3.5-9B   +   (Opus-distilled sibling)
         ╲                ╱
          Darwin V7 evolutionary merge

          Darwin-9B-Opus  ── stand-alone reasoning model (Apache 2.0)

          NEG-Head / NEG-Gate training (Darwin V8)

          Darwin-9B-NEG  ── THIS MODEL
```

- **Base**: [FINAL-Bench/Darwin-9B-Opus](https://huggingface.co/FINAL-Bench/Darwin-9B-Opus) (weights frozen during NEG training)
- **Technology generation**: Darwin V8 (Native Entropy Gating) — successor to Darwin V7 (evolutionary merging)

---

## 🎯 Recommended Use-Cases

- **Graduate-level STEM reasoning** — physics, chemistry, biology, mathematics (GPQA-style)
- **Mathematical problem solving** (MATH, AIME-style)
- **Code reasoning and debugging** (HumanEval-style)
- **Complex chain-of-thought** tasks where a small reasoning model with a big boost is desired

## ⚠️ Limitations

- Optimised for English first, with secondary support for Korean / Chinese / Japanese.
- At 8.95 B parameters, knowledge coverage is smaller than the larger Darwin models (27B / 31B / 36B) — for pure world-knowledge tasks consider Darwin-36B-Opus.
- The Ensemble mode (84.34 %) uses ≈ 20× inference; choose Pure NEG (mode 1) for cost-sensitive deployments.

---

## 📚 Citation

```bibtex
@misc{darwin9b_neg_2026,
  title  = {Darwin-9B-NEG: Native Entropy Gating for Self-Regulated Reasoning at 1x Inference Cost},
  author = {FINAL-Bench / Darwin Research Team},
  year   = {2026},
  howpublished = {\url{https://huggingface.co/FINAL-Bench/Darwin-9B-NEG}},
  note   = {Darwin V8 — Native Entropy Gating technology generation}
}
```

---

## 🔗 Related Darwin Models

- **Darwin-36B-Opus** — MoE 36B, Qwen3.6-35B-A3B × Opus distilled, GPQA 88.4 %
- **Darwin-31B-Opus** — 31B multilingual-strong reasoning
- **Darwin-27B-Opus** — 27B dense, GPQA 86.9 %
- **Darwin-28B-Opus** — Qwen3.6-27B × rico03 Opus distilled (new 2026-04)
- **Darwin-9B-Opus** — this model's base, Qwen3.5-9B family
- **Darwin-4B-Genesis** — smallest member, Gemma4 family

---
This model is introduced in [Darwin Family](https://arxiv.org/abs/2605.14386).

*Darwin V8 · Sealed 2026-04-24 · FINAL-Bench*