SeaWolf-AI commited on
Commit
f6b3294
·
verified ·
1 Parent(s): aef00eb

Remove trade-secret MRI report + replace README with proper English version (Darwin V8 NEG, GPQA 84.34%)

Browse files
Files changed (2) hide show
  1. README.md +159 -144
  2. darwin_mri_report.json +0 -7
README.md CHANGED
@@ -10,8 +10,10 @@ tags:
10
  - NEG
11
  - reasoning
12
  - self-regulated-reasoning
 
13
  - thinking
14
  - qwen3.5
 
15
  - gpqa
16
  - benchmark
17
  - open-source
@@ -28,231 +30,244 @@ language:
28
  - multilingual
29
  pipeline_tag: text-generation
30
  library_name: transformers
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
  ---
32
 
33
- # Darwin-9B-NEG — First Native Entropy Gating Model
34
 
35
  <p align="center">
36
- <a href="https://huggingface.co/FINAL-Bench/Darwin-9B-NEG"><img src="https://img.shields.io/badge/⭐_Darwin_V8-NEG_Native_Entropy_Gating-gold?style=for-the-badge" alt="NEG"></a>
37
- <a href="https://huggingface.co/FINAL-Bench/Darwin-9B-Opus"><img src="https://img.shields.io/badge/Base-Darwin--9B--Opus-blue?style=for-the-badge" alt="Base"></a>
38
  </p>
39
 
40
- > Qwen3.5-9B backbone | 8.95B params | Thinking Mode | BF16 | Apache 2.0
41
- > **First NEG-enabled model — self-regulating reasoning at 1x inference cost**
42
-
43
- ---
 
 
 
44
 
45
- ## 🎯 About NEG (Native Entropy Gating)
 
 
 
46
 
47
- **Darwin 모델이 GPQA Diamond(박사급 고급 추론) 벤치마크에서 높은 점수를 얻는 비결**
 
48
 
49
- 신기술 **'NEG(Native Entropy Gating)'** 는 AI에게 **"자기 확신 감각"을 아키텍처 차원에서 심어주는 Darwin 독자 기술**입니다. 외부 플러그인이나 서빙 옵션이 아닌, **모델 가중치 자체에 임플란트처럼 내재화된 메커니즘**으로, 모델이 생성 루프 안에서 스스로 불확실한 순간을 감지하고 그 자리에서 답을 다듬습니다.
50
 
51
- 기존에는 추론 정확도를 높이려면 같은 답을 3~8번 반복 생성해야 했지만, **NEG는 전체 생성의 5% 미만에서만 작동**하므로 추가 비용이 거의 없으며, **추론력(벤치마크로 입증)을 약 10% 이상 끌어올립니다.**
52
 
53
- 별도 라이브러리·추론 엔진·외부 모듈 없이 **모델 파일 하나만 배포**하면 모든 기능이 함께 동작하므로, 고객사의 기존 온프레미스 인프라를 그대로 쓰면서 성능만 향상됩니다. 추가 GPU 구매·추가 라이선스·추가 운영비 **모두 필요 없습니다.**
54
 
55
- 모델 크기를 키우지 않고도 차세대 추론 성능을 얻는 **아키텍처 내재화 접근**, **Darwin-NEG 시리즈**의 핵심 기술입니다.
56
 
57
  ---
58
 
59
- ## 📊 PoC Evaluation Results (GPQA Diamond, Greedy mode)
 
 
 
60
 
61
- Evaluation on **same 80 questions**, **same deterministic Greedy decoding**, **same 1x inference cost**:
 
62
 
63
- | Question Set | Baseline (Darwin-9B-Opus) | **NEG-enabled (this model)** | **Δ** |
64
- |:---:|:---:|:---:|:---:|
65
- | Q20 | 55.0% | **70.0%** | **+15.0%p** 🔥 |
66
- | Q40 | 52.5% | **60.0%** | **+7.5%p** ✅ |
67
- | Q60 | 51.7% | **63.3%** | **+11.6%p** 🔥 |
68
- | Q80 | 51.2% | **62.5%** | **+11.3%p** 🔥 |
69
 
70
- **Average Δ: +11.35 percentage points** (GO threshold was +2%p **5.65× exceeded**)
71
 
72
- **Gate activation rate**: 4.6-5.1% (conditional, not over-active)
 
 
 
 
 
73
 
74
  ---
75
 
76
- ## 🏗️ Architecture
77
 
78
  ```
79
  Input Text
80
 
81
- [Darwin-9B-Opus Base (FROZEN)]
82
 
83
- [Transformer Layers × 32]
84
 
85
- last hidden state
86
- ├──▶ NEG-Head (4.2M params) ──▶ predicted_entropy
87
- │ │
88
- ▼ ▼
89
- LM Head ─▶ base_logits ──▶ NEG-Gate (top-k masking)
90
-
91
- guided_logits
92
-
93
- next_token
 
 
 
 
 
94
  ```
95
 
96
  ### Key Specifications
97
 
98
  | Component | Value |
99
- |---|---|
100
- | Base model | Darwin-9B-Opus (Qwen3.5 family) |
101
- | Total parameters | 8.95 B |
102
- | NEG-Head parameters | 4.2 M (0.05%) |
103
- | NEG-Gate parameters | 1 (learnable threshold) |
104
- | NEG activation rate | 4.8% (typical) |
105
- | NEG-Head Pearson correlation | 0.8744 |
106
- | NEG-Gate threshold (learned) | 1.175 |
107
- | NEG-Gate top_k | 20 |
108
- | Context | 262,144 tokens |
109
- | Dtype | bfloat16 |
110
  | License | Apache 2.0 |
111
 
112
  ---
113
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
114
  ## 🚀 Usage
115
 
116
- ### Quick Start
117
 
118
  ```python
119
- from modeling_darwin_neg import load_darwin_neg
120
  import torch
121
 
122
- model = load_darwin_neg(
 
 
 
 
123
  "FINAL-Bench/Darwin-9B-NEG",
124
  torch_dtype=torch.bfloat16,
125
  device_map="auto",
126
- hf_token="hf_xxx", # required for private repo
127
  )
128
 
129
- from transformers import AutoTokenizer
130
- tok = AutoTokenizer.from_pretrained("FINAL-Bench/Darwin-9B-NEG", trust_remote_code=True)
131
-
132
- messages = [{"role": "user", "content": "Solve: What is the derivative of sin(x²)?"}]
133
  text = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
134
  inputs = tok(text, return_tensors="pt").to(model.device)
135
  outputs = model.generate(**inputs, max_new_tokens=2048, do_sample=False)
136
  print(tok.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True))
137
  ```
138
 
139
- ### Manual loading (more control)
 
 
140
 
141
  ```python
142
- from transformers import AutoModelForCausalLM, AutoTokenizer
143
- from modeling_darwin_neg import attach_neg
144
 
145
- model = AutoModelForCausalLM.from_pretrained(
146
  "FINAL-Bench/Darwin-9B-NEG",
147
- torch_dtype=torch.bfloat16, device_map="auto",
148
- trust_remote_code=True, token="hf_xxx",
149
  )
150
- model = attach_neg(model, "FINAL-Bench/Darwin-9B-NEG", hf_token="hf_xxx")
151
- # NEG is now active — use model.generate() normally
152
  ```
153
 
154
- ### How NEG works at runtime
155
-
156
- NEG is applied at every generation step:
157
- 1. Model computes hidden state for current position
158
- 2. NEG-Head predicts the entropy from hidden state
159
- 3. If predicted_entropy > threshold (1.175), NEG-Gate applies top-k masking (k=20) to logits
160
- 4. Otherwise, logits pass through unchanged
161
- 5. argmax or sample next token
162
 
163
- In typical reasoning traces, NEG activates on **4.6-5.1% of tokens** only at genuinely ambiguous decision points.
 
 
164
 
165
  ---
166
 
167
- ## 🔬 Training Procedure
168
-
169
- NEG was trained via 7-phase pipeline:
170
-
171
- 1. **Phase 0-1**: Load base Darwin-9B-Opus, compute SHA256 hash for later frozen verification
172
- 2. **Phase 2**: Collect 30,208 teacher entropy samples from GPQA extended (training set, Diamond excluded)
173
- 3. **Phase 3**: Joint train NEG-Head + NEG-Gate with MSE (entropy) + 0.3·CE (next-token) loss, 3 epochs
174
- 4. **Phase 4**: Verify base model hash unchanged (confirmed: 100% frozen)
175
- 5. **Phase 5**: Evaluate baseline (Darwin-9B-Opus alone) on GPQA Diamond Greedy
176
- 6. **Phase 6**: Evaluate NEG-enabled model on same GPQA Diamond Greedy
177
- 7. **Phase 7**: Compare — **+11.3%p sustained improvement confirmed**
178
-
179
- ### NEG Training Hyperparameters
180
- - Batch size: 32
181
- - Learning rate: 1e-4 (AdamW, weight_decay=0)
182
- - Loss: `loss_ent + 0.3 * loss_ce`
183
- - Epochs: 3 (early-stop at Pearson > 0.8)
184
- - Gradient clipping: 1.0
185
 
186
- ---
187
-
188
- ## 📦 Files
 
 
 
 
 
 
 
 
189
 
190
- | File | Purpose |
191
- |---|---|
192
- | `model-*-of-*.safetensors` | Base Darwin-9B-Opus weights (frozen) |
193
- | `config.json` | Model config + `neg_config` metadata |
194
- | `neg_modules.safetensors` | NEG-Head + NEG-Gate weights |
195
- | `modeling_darwin_neg.py` | Custom loader and `attach_neg` utility |
196
- | `tokenizer.json`, `tokenizer_config.json` | Tokenizer |
197
- | `chat_template.jinja` | Chat template (Qwen3.5-style) |
198
- | `README.md` | This file |
199
 
200
  ---
201
 
202
- ## ⚠️ Comparison with MTI
203
-
204
- Darwin V7 uses external Multi-Turn Iteration (MTI) for reasoning enhancement. NEG is **NOT** a replacement or variant — it's a complementary technique operating at a different level:
205
 
206
- | Property | External MTI | Darwin V8 NEG |
207
- |:---|:---:|:---:|
208
- | Unit of operation | Full answer (problem) | Single token |
209
- | Signal source | Multiple sampled answers | Internal hidden state |
210
- | Inference cost | 3-8× | **1×** |
211
- | External pipeline | Required | **Not required** |
212
- | Deployment | Complex | **Single file** |
213
- | Combinable | — | Yes, multiplicative |
214
 
215
- **NEG × MTI synergy** is expected to yield additional gains when stacked.
216
 
217
- ---
218
-
219
- ## 🏆 Darwin Model Family
220
-
221
- | Model | Base | Params | GPQA Diamond (Greedy) |
222
- |---|---|---|---|
223
- | Darwin-9B-Opus | Qwen3.5-9B | 9 B | 51.0% |
224
- | **Darwin-9B-NEG (this)** | Darwin-9B-Opus | **9 B** | **~62%** (+11.3%p, Greedy only) |
225
- | Darwin-27B-Opus | Qwen3.5-27B | 27 B | 86.9% (with full 5-phase eval) |
226
- | Darwin-36B-Opus | Qwen3.6-35B-A3B | 36 B | 88.4% (with full 5-phase eval) |
227
-
228
- Future: **Darwin-27B-NEG**, **Darwin-36B-NEG** (targeting GPQA 90%+ at 1x cost)
229
 
230
  ---
231
 
232
- ## 📚 References
233
 
234
- - [GPQA: A Graduate-Level Google-Proof Q&A Benchmark](https://huggingface.co/datasets/Idavidrein/gpqa)
235
- - Darwin V7 base model: [FINAL-Bench/Darwin-9B-Opus](https://huggingface.co/FINAL-Bench/Darwin-9B-Opus)
236
- - NEG technical report: (see `reports/` in training repo)
 
 
 
 
 
 
237
 
238
  ---
239
 
240
- ## 🙏 Acknowledgments
241
 
242
- - Qwen Team (base model architecture)
243
- - FINAL-Bench / VIDRAFT_LAB (Darwin V8 NEG engine + training pipeline)
244
- - Anthropic Claude Opus 4.6 (reasoning teacher for base distillation)
 
 
 
245
 
246
  ---
247
 
248
- ## 📜 Citation
249
-
250
- ```bibtex
251
- @misc{darwin-9b-neg,
252
- title = {Darwin-9B-NEG: First Native Entropy Gating Model},
253
- author = {FINAL-Bench and VIDRAFT_LAB},
254
- year = {2026},
255
- url = {https://huggingface.co/FINAL-Bench/Darwin-9B-NEG},
256
- note = {Darwin V8, NEG = self-regulating reasoning at 1x inference cost}
257
- }
258
- ```
 
10
  - NEG
11
  - reasoning
12
  - self-regulated-reasoning
13
+ - advanced-reasoning
14
  - thinking
15
  - qwen3.5
16
+ - qwen
17
  - gpqa
18
  - benchmark
19
  - open-source
 
30
  - multilingual
31
  pipeline_tag: text-generation
32
  library_name: transformers
33
+ model-index:
34
+ - name: Darwin-9B-NEG
35
+ results:
36
+ - task:
37
+ type: text-generation
38
+ name: Graduate-Level Reasoning
39
+ dataset:
40
+ type: Idavidrein/gpqa
41
+ name: GPQA Diamond
42
+ config: gpqa_diamond
43
+ split: train
44
+ metrics:
45
+ - type: accuracy
46
+ value: 84.34
47
+ name: Accuracy
48
+ verified: false
49
  ---
50
 
51
+ # Darwin-9B-NEG — The First Native Entropy Gating Model
52
 
53
  <p align="center">
54
+ <a href="https://huggingface.co/FINAL-Bench/Darwin-9B-NEG"><img src="https://img.shields.io/badge/⭐_GPQA_Diamond-84.34%25_Darwin--9B--NEG-gold?style=for-the-badge" alt="GPQA"></a>
55
+ <a href="https://huggingface.co/FINAL-Bench/Darwin-9B-Opus"><img src="https://img.shields.io/badge/🧬_Base-Darwin--9B--Opus-blue?style=for-the-badge" alt="Base"></a>
56
  </p>
57
 
58
+ <p align="center">
59
+ <a href="https://huggingface.co/FINAL-Bench/Darwin-4B-Genesis"><img src="https://img.shields.io/badge/🧬_Model-Darwin--4B--Genesis-blue?style=for-the-badge" alt="Genesis"></a>
60
+ <a href="https://huggingface.co/FINAL-Bench/Darwin-9B-Opus"><img src="https://img.shields.io/badge/🧬_Model-Darwin--9B--Opus-blue?style=for-the-badge" alt="9B"></a>
61
+ <a href="https://huggingface.co/FINAL-Bench/Darwin-27B-Opus"><img src="https://img.shields.io/badge/🧬_Model-Darwin--27B--Opus-blue?style=for-the-badge" alt="27B"></a>
62
+ <a href="https://huggingface.co/FINAL-Bench/Darwin-31B-Opus"><img src="https://img.shields.io/badge/🧬_Model-Darwin--31B--Opus-blue?style=for-the-badge" alt="31B"></a>
63
+ <a href="https://huggingface.co/FINAL-Bench/Darwin-36B-Opus"><img src="https://img.shields.io/badge/🧬_Model-Darwin--36B--Opus-blue?style=for-the-badge" alt="36B"></a>
64
+ </p>
65
 
66
+ <p align="center">
67
+ <a href="https://huggingface.co/collections/FINAL-Bench/darwin-family"><img src="https://img.shields.io/badge/🏠_Darwin_Family-Collection-green?style=for-the-badge" alt="Family"></a>
68
+ <a href="https://huggingface.co/spaces/FINAL-Bench/Leaderboard"><img src="https://img.shields.io/badge/🏆_FINAL_Bench-Leaderboard-green?style=for-the-badge" alt="FINAL Bench"></a>
69
+ </p>
70
 
71
+ > Qwen3.5-9B backbone · 8.95B parameters · BF16 · Thinking Mode · Apache 2.0
72
+ > **The first NEG-enabled model — self-regulating reasoning with no extra library.**
73
 
74
+ ---
75
 
76
+ ## Abstract
77
 
78
+ **Darwin-9B-NEG** is the first model in the Darwin series to feature **Native Entropy Gating (NEG)** a proprietary Darwin architectural innovation that embeds a sense of *self-confidence* directly into the model weights. Unlike external multi-turn iteration (MTI) techniques that require 3×–8× extra inference, NEG operates *inside* the single decoding loop and activates in fewer than 5 % of generation steps, lifting reasoning accuracy **by more than 12 percentage points at 1× inference cost**.
79
 
80
+ On the **GPQA Diamond** PhD-level reasoning benchmark (198 questions), Darwin-9B-NEG scores **84.34 %** with the full 3-stage ensemble protocol — surpassing even the published Qwen3.5-9B leaderboard result (81.7 %).
81
 
82
  ---
83
 
84
+ ## What Makes Darwin-9B-NEG Different
85
+
86
+ ### 🧬 Darwin Series — Evolutionary Model Merging
87
+ The Darwin family is produced by **Darwin V7**, an evolutionary breeding engine that recombines two parent LLMs into a single descendant, preserving hybrid vigour across reasoning and knowledge capabilities. **Darwin-9B-Opus** — this model's base — is the Qwen3.5-family member of the Darwin series, previously published as a stand-alone reasoning model.
88
 
89
+ ### NEG Native Entropy Gating (Darwin V8)
90
+ **NEG** is a proprietary Darwin technology that gives the language model an architecturally-internalised *self-confidence sense*. Two tiny learnable modules ride alongside the transformer:
91
 
92
+ - **NEG-Head** (≈ 4 M params, ~ 0.05 % of total weights) predicts, at each step, the entropy of the next-token distribution from the last hidden state.
93
+ - **NEG-Gate** (1 learnable threshold) decides, on a per-token basis, whether the model is "confident enough" to commit to its top choice, or whether it should restrict its choice to a narrow top-k subset.
 
 
 
 
94
 
95
+ Because NEG is carried *inside* the model weights themselves, there is nothing extra to ship or to install: standard `transformers` loading with `trust_remote_code=True` attaches the modules automatically. The model file *is* the feature.
96
 
97
+ **Why it matters**
98
+ - **1× inference cost** — no multi-sample voting, no multi-turn loops
99
+ - **< 5 % gate activation** — negligible latency overhead versus the base model
100
+ - **+12.63 %p on GPQA Diamond** vs. the NEG-free Darwin-9B-Opus baseline (same greedy decoding, same prompt, same tokens)
101
+ - **Single-file deployment** — drop in to vLLM / SGLang / TGI / `transformers`, no new engine required
102
+ - **No trade-secret leaks** — the merge recipe is kept internal; only the final model weights are released under Apache 2.0
103
 
104
  ---
105
 
106
+ ## 🏗️ Architecture Overview
107
 
108
  ```
109
  Input Text
110
 
111
+ [Darwin-9B-Opus backbone (frozen during NEG training)]
112
 
113
+ Transformer Layers × 32
114
 
115
+ last hidden state ──┐
116
+ │ │
117
+ ▼ ▼
118
+ LM Head NEG-Head
119
+ │ │
120
+ base logits predicted entropy
121
+ │ │
122
+ └──▶ NEG-Gate ◀─┘
123
+
124
+
125
+ guided logits
126
+
127
+
128
+ next token
129
  ```
130
 
131
  ### Key Specifications
132
 
133
  | Component | Value |
134
+ |:---|:---|
135
+ | Architecture | Qwen3.5 decoder-only transformer (32 layers, hidden 4096) |
136
+ | Total parameters | 8.95 B (base) + ≈ 4 M (NEG modules) |
137
+ | NEG-Head | 2-layer MLP with softplus output |
138
+ | NEG-Gate | top-k masking gate with learnable entropy threshold |
139
+ | Precision | bfloat16 |
140
+ | Context length | inherited from Darwin-9B-Opus |
 
 
 
 
141
  | License | Apache 2.0 |
142
 
143
  ---
144
 
145
+ ## 🏆 Benchmark Results — GPQA Diamond (198 PhD-level questions)
146
+
147
+ Darwin-9B-NEG ships **three decoding modes** from the *same* model weights, allowing users to trade inference cost for accuracy:
148
+
149
+ | Mode | Decoding Protocol | Inference Cost | **Accuracy** |
150
+ |:---:|:---|:---:|:---:|
151
+ | **0 · Baseline** | Darwin-9B-Opus greedy (NEG disabled) | 1× | 51.01 % |
152
+ | **1 · Pure NEG** | greedy decoding **with NEG enabled** | **1×** | **63.64 %** |
153
+ | **2 · Permutation** | NEG + choice-order permutation (4 orderings, majority) | 4× | 76.26 % |
154
+ | **3 · Ensemble Refinement** | NEG + permutation + temperature-sampled ensemble | ≈ 20× | **🥇 84.34 %** |
155
+
156
+ **Improvements:**
157
+ - Pure NEG (mode 1) vs. baseline: **+12.63 %p at identical inference cost**
158
+ - Ensemble (mode 3) vs. baseline: **+33.33 %p**
159
+ - Ensemble vs. Qwen3.5-9B leaderboard score (81.7 %): **+2.64 %p**
160
+
161
+ > **Gate activation rate**: 4.36 % (measured across the 198-question greedy run) — NEG fires conservatively, only when the model is genuinely uncertain.
162
+
163
+ ---
164
+
165
  ## 🚀 Usage
166
 
167
+ ### Quick start — Pure NEG greedy (mode 1, sales default)
168
 
169
  ```python
170
+ from transformers import AutoTokenizer, AutoModelForCausalLM
171
  import torch
172
 
173
+ tok = AutoTokenizer.from_pretrained(
174
+ "FINAL-Bench/Darwin-9B-NEG",
175
+ trust_remote_code=True,
176
+ )
177
+ model = AutoModelForCausalLM.from_pretrained(
178
  "FINAL-Bench/Darwin-9B-NEG",
179
  torch_dtype=torch.bfloat16,
180
  device_map="auto",
181
+ trust_remote_code=True,
182
  )
183
 
184
+ messages = [
185
+ {"role": "user", "content": "Solve: If f(x) = x³ − 3x + 2, find and classify all critical points."}
186
+ ]
 
187
  text = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
188
  inputs = tok(text, return_tensors="pt").to(model.device)
189
  outputs = model.generate(**inputs, max_new_tokens=2048, do_sample=False)
190
  print(tok.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True))
191
  ```
192
 
193
+ ### Using the bundled NEG loader helper
194
+
195
+ `modeling_darwin_neg.py` is shipped inside the repo and provides a convenience loader:
196
 
197
  ```python
198
+ from modeling_darwin_neg import load_darwin_neg
 
199
 
200
+ model = load_darwin_neg(
201
  "FINAL-Bench/Darwin-9B-NEG",
202
+ hf_token="hf_xxx",
 
203
  )
 
 
204
  ```
205
 
206
+ ### Mode selection
 
 
 
 
 
 
 
207
 
208
+ - **Mode 1 (Pure NEG)**: default `do_sample=False`, NEG is always on.
209
+ - **Mode 2 (Permutation)**: shuffle the option order 4 times, greedy each, majority-vote.
210
+ - **Mode 3 (Ensemble)**: production protocol combining permutation, temperature sampling and second-opinion re-query (internal; reproduction scripts are released separately).
211
 
212
  ---
213
 
214
+ ## 🧬 Model Lineage
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
215
 
216
+ ```
217
+ Qwen/Qwen3.5-9B + (Opus-distilled sibling)
218
+ ╲ ╱
219
+ Darwin V7 evolutionary merge
220
+
221
+ Darwin-9B-Opus ── stand-alone reasoning model (Apache 2.0)
222
+
223
+ NEG-Head / NEG-Gate training (Darwin V8)
224
+
225
+ Darwin-9B-NEG ── THIS MODEL
226
+ ```
227
 
228
+ - **Base**: [FINAL-Bench/Darwin-9B-Opus](https://huggingface.co/FINAL-Bench/Darwin-9B-Opus) (weights frozen during NEG training)
229
+ - **Technology generation**: Darwin V8 (Native Entropy Gating) — successor to Darwin V7 (evolutionary merging)
 
 
 
 
 
 
 
230
 
231
  ---
232
 
233
+ ## 🎯 Recommended Use-Cases
 
 
234
 
235
+ - **Graduate-level STEM reasoning** physics, chemistry, biology, mathematics (GPQA-style)
236
+ - **Mathematical problem solving** (MATH, AIME-style)
237
+ - **Code reasoning and debugging** (HumanEval-style)
238
+ - **Complex chain-of-thought** tasks where a small reasoning model with a big boost is desired
 
 
 
 
239
 
240
+ ## ⚠️ Limitations
241
 
242
+ - Optimised for English first, with secondary support for Korean / Chinese / Japanese.
243
+ - At 8.95 B parameters, knowledge coverage is smaller than the larger Darwin models (27B / 31B / 36B) — for pure world-knowledge tasks consider Darwin-36B-Opus.
244
+ - The Ensemble mode (84.34 %) uses ≈ 20× inference; choose Pure NEG (mode 1) for cost-sensitive deployments.
 
 
 
 
 
 
 
 
 
245
 
246
  ---
247
 
248
+ ## 📚 Citation
249
 
250
+ ```bibtex
251
+ @misc{darwin9b_neg_2026,
252
+ title = {Darwin-9B-NEG: Native Entropy Gating for Self-Regulated Reasoning at 1x Inference Cost},
253
+ author = {FINAL-Bench / Darwin Research Team},
254
+ year = {2026},
255
+ howpublished = {\url{https://huggingface.co/FINAL-Bench/Darwin-9B-NEG}},
256
+ note = {Darwin V8 — Native Entropy Gating technology generation}
257
+ }
258
+ ```
259
 
260
  ---
261
 
262
+ ## 🔗 Related Darwin Models
263
 
264
+ - **Darwin-36B-Opus** MoE 36B, Qwen3.6-35B-A3B × Opus distilled, GPQA 88.4 %
265
+ - **Darwin-31B-Opus** 31B multilingual-strong reasoning
266
+ - **Darwin-27B-Opus** 27B dense, GPQA 86.9 %
267
+ - **Darwin-28B-Opus** — Qwen3.6-27B × rico03 Opus distilled (new 2026-04)
268
+ - **Darwin-9B-Opus** — this model's base, Qwen3.5-9B family
269
+ - **Darwin-4B-Genesis** — smallest member, Gemma4 family
270
 
271
  ---
272
 
273
+ *Darwin V8 · Sealed 2026-04-24 · FINAL-Bench*
 
 
 
 
 
 
 
 
 
 
darwin_mri_report.json DELETED
@@ -1,7 +0,0 @@
1
- {
2
- "layers_total": 775,
3
- "transplant_a": 0,
4
- "transplant_b": 0,
5
- "blended": 775,
6
- "details": {}
7
- }