Text Generation
Transformers
Safetensors
qwen3_5
image-text-to-text
darwin
darwin-v8
darwin-neg
native-entropy-gating
NEG
reasoning
self-regulated-reasoning
advanced-reasoning
thinking
qwen3.5
qwen
gpqa
benchmark
open-source
apache-2.0
hybrid-vigor
proto-agi
vidraft
Eval Results
conversational
Eval Results (legacy)
Instructions to use FINAL-Bench/Darwin-9B-NEG with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use FINAL-Bench/Darwin-9B-NEG with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="FINAL-Bench/Darwin-9B-NEG") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("FINAL-Bench/Darwin-9B-NEG") model = AutoModelForImageTextToText.from_pretrained("FINAL-Bench/Darwin-9B-NEG") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use FINAL-Bench/Darwin-9B-NEG with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "FINAL-Bench/Darwin-9B-NEG" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FINAL-Bench/Darwin-9B-NEG", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/FINAL-Bench/Darwin-9B-NEG
- SGLang
How to use FINAL-Bench/Darwin-9B-NEG with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "FINAL-Bench/Darwin-9B-NEG" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FINAL-Bench/Darwin-9B-NEG", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "FINAL-Bench/Darwin-9B-NEG" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FINAL-Bench/Darwin-9B-NEG", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use FINAL-Bench/Darwin-9B-NEG with Docker Model Runner:
docker model run hf.co/FINAL-Bench/Darwin-9B-NEG
| license: apache-2.0 | |
| base_model: | |
| - FINAL-Bench/Darwin-9B-Opus | |
| tags: | |
| - darwin | |
| - darwin-v8 | |
| - darwin-neg | |
| - native-entropy-gating | |
| - NEG | |
| - reasoning | |
| - self-regulated-reasoning | |
| - advanced-reasoning | |
| - thinking | |
| - qwen3.5 | |
| - qwen | |
| - gpqa | |
| - benchmark | |
| - open-source | |
| - apache-2.0 | |
| - hybrid-vigor | |
| - proto-agi | |
| - vidraft | |
| - eval-results | |
| language: | |
| - en | |
| - zh | |
| - ko | |
| - ja | |
| - multilingual | |
| pipeline_tag: text-generation | |
| library_name: transformers | |
| model-index: | |
| - name: Darwin-9B-NEG | |
| results: | |
| - task: | |
| type: text-generation | |
| name: Graduate-Level Reasoning | |
| dataset: | |
| type: Idavidrein/gpqa | |
| name: GPQA Diamond | |
| config: gpqa_diamond | |
| split: train | |
| metrics: | |
| - type: accuracy | |
| value: 84.34 | |
| name: Accuracy | |
| verified: false | |
| # Darwin-9B-NEG — The First Native Entropy Gating Model | |
| <p align="center"> | |
| <a href="https://huggingface.co/FINAL-Bench/Darwin-9B-NEG"><img src="https://img.shields.io/badge/⭐_GPQA_Diamond-84.34%25_Darwin--9B--NEG-gold?style=for-the-badge" alt="GPQA"></a> | |
| <a href="https://huggingface.co/FINAL-Bench/Darwin-9B-Opus"><img src="https://img.shields.io/badge/🧬_Base-Darwin--9B--Opus-blue?style=for-the-badge" alt="Base"></a> | |
| </p> | |
| <p align="center"> | |
| <a href="https://huggingface.co/FINAL-Bench/Darwin-4B-Genesis"><img src="https://img.shields.io/badge/🧬_Model-Darwin--4B--Genesis-blue?style=for-the-badge" alt="Genesis"></a> | |
| <a href="https://huggingface.co/FINAL-Bench/Darwin-9B-Opus"><img src="https://img.shields.io/badge/🧬_Model-Darwin--9B--Opus-blue?style=for-the-badge" alt="9B"></a> | |
| <a href="https://huggingface.co/FINAL-Bench/Darwin-27B-Opus"><img src="https://img.shields.io/badge/🧬_Model-Darwin--27B--Opus-blue?style=for-the-badge" alt="27B"></a> | |
| <a href="https://huggingface.co/FINAL-Bench/Darwin-31B-Opus"><img src="https://img.shields.io/badge/🧬_Model-Darwin--31B--Opus-blue?style=for-the-badge" alt="31B"></a> | |
| <a href="https://huggingface.co/FINAL-Bench/Darwin-36B-Opus"><img src="https://img.shields.io/badge/🧬_Model-Darwin--36B--Opus-blue?style=for-the-badge" alt="36B"></a> | |
| </p> | |
| <p align="center"> | |
| <a href="https://huggingface.co/collections/FINAL-Bench/darwin-family"><img src="https://img.shields.io/badge/🏠_Darwin_Family-Collection-green?style=for-the-badge" alt="Family"></a> | |
| <a href="https://huggingface.co/spaces/FINAL-Bench/Leaderboard"><img src="https://img.shields.io/badge/🏆_FINAL_Bench-Leaderboard-green?style=for-the-badge" alt="FINAL Bench"></a> | |
| </p> | |
| > Qwen3.5-9B backbone · 8.95B parameters · BF16 · Thinking Mode · Apache 2.0 | |
| > **The first NEG-enabled model — self-regulating reasoning with no extra library.** | |
| --- | |
| ## Abstract | |
| **Darwin-9B-NEG** is the first model in the Darwin series to feature **Native Entropy Gating (NEG)** — a proprietary Darwin architectural innovation that embeds a sense of *self-confidence* directly into the model weights. Unlike external multi-turn iteration (MTI) techniques that require 3×–8× extra inference, NEG operates *inside* the single decoding loop and activates in fewer than 5 % of generation steps, lifting reasoning accuracy **by more than 12 percentage points at 1× inference cost**. | |
| On the **GPQA Diamond** PhD-level reasoning benchmark (198 questions), Darwin-9B-NEG scores **84.34 %** with the full 3-stage ensemble protocol — surpassing even the published Qwen3.5-9B leaderboard result (81.7 %). | |
| --- | |
| ## What Makes Darwin-9B-NEG Different | |
| ### 🧬 Darwin Series — Evolutionary Model Merging | |
| The Darwin family is produced by **Darwin V7**, an evolutionary breeding engine that recombines two parent LLMs into a single descendant, preserving hybrid vigour across reasoning and knowledge capabilities. **Darwin-9B-Opus** — this model's base — is the Qwen3.5-family member of the Darwin series, previously published as a stand-alone reasoning model. | |
| ### ⚡ NEG — Native Entropy Gating (Darwin V8) | |
| **NEG** is a proprietary Darwin technology that gives the language model an architecturally-internalised *self-confidence sense*. Two tiny learnable modules ride alongside the transformer: | |
| - **NEG-Head** (≈ 4 M params, ~ 0.05 % of total weights) predicts, at each step, the entropy of the next-token distribution from the last hidden state. | |
| - **NEG-Gate** (1 learnable threshold) decides, on a per-token basis, whether the model is "confident enough" to commit to its top choice, or whether it should restrict its choice to a narrow top-k subset. | |
| Because NEG is carried *inside* the model weights themselves, there is nothing extra to ship or to install: standard `transformers` loading with `trust_remote_code=True` attaches the modules automatically. The model file *is* the feature. | |
| **Why it matters** | |
| - **1× inference cost** — no multi-sample voting, no multi-turn loops | |
| - **< 5 % gate activation** — negligible latency overhead versus the base model | |
| - **+12.63 %p on GPQA Diamond** vs. the NEG-free Darwin-9B-Opus baseline (same greedy decoding, same prompt, same tokens) | |
| - **Single-file deployment** — drop in to vLLM / SGLang / TGI / `transformers`, no new engine required | |
| - **No trade-secret leaks** — the merge recipe is kept internal; only the final model weights are released under Apache 2.0 | |
| --- | |
| ## 🏗️ Architecture Overview | |
| ``` | |
| Input Text | |
| ↓ | |
| [Darwin-9B-Opus backbone (frozen during NEG training)] | |
| ↓ | |
| Transformer Layers × 32 | |
| ↓ | |
| last hidden state ──┐ | |
| │ │ | |
| ▼ ▼ | |
| LM Head NEG-Head | |
| │ │ | |
| base logits predicted entropy | |
| │ │ | |
| └──▶ NEG-Gate ◀─┘ | |
| │ | |
| ▼ | |
| guided logits | |
| │ | |
| ▼ | |
| next token | |
| ``` | |
| ### Key Specifications | |
| | Component | Value | | |
| |:---|:---| | |
| | Architecture | Qwen3.5 decoder-only transformer (32 layers, hidden 4096) | | |
| | Total parameters | 8.95 B (base) + ≈ 4 M (NEG modules) | | |
| | NEG-Head | 2-layer MLP with softplus output | | |
| | NEG-Gate | top-k masking gate with learnable entropy threshold | | |
| | Precision | bfloat16 | | |
| | Context length | inherited from Darwin-9B-Opus | | |
| | License | Apache 2.0 | | |
| --- | |
| ## 🏆 Benchmark Results — GPQA Diamond (198 PhD-level questions) | |
| Darwin-9B-NEG ships **three decoding modes** from the *same* model weights, allowing users to trade inference cost for accuracy: | |
| | Mode | Decoding Protocol | Inference Cost | **Accuracy** | | |
| |:---:|:---|:---:|:---:| | |
| | **0 · Baseline** | Darwin-9B-Opus greedy (NEG disabled) | 1× | 51.01 % | | |
| | **1 · Pure NEG** | greedy decoding **with NEG enabled** | **1×** | **63.64 %** | | |
| | **2 · Permutation** | NEG + choice-order permutation (4 orderings, majority) | 4× | 76.26 % | | |
| | **3 · Ensemble Refinement** | NEG + permutation + temperature-sampled ensemble | ≈ 20× | **🥇 84.34 %** | | |
| **Improvements:** | |
| - Pure NEG (mode 1) vs. baseline: **+12.63 %p at identical inference cost** | |
| - Ensemble (mode 3) vs. baseline: **+33.33 %p** | |
| - Ensemble vs. Qwen3.5-9B leaderboard score (81.7 %): **+2.64 %p** | |
| > **Gate activation rate**: 4.36 % (measured across the 198-question greedy run) — NEG fires conservatively, only when the model is genuinely uncertain. | |
| --- | |
| ## 🚀 Usage | |
| ### Quick start — Pure NEG greedy (mode 1, sales default) | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForCausalLM | |
| import torch | |
| tok = AutoTokenizer.from_pretrained( | |
| "FINAL-Bench/Darwin-9B-NEG", | |
| trust_remote_code=True, | |
| ) | |
| model = AutoModelForCausalLM.from_pretrained( | |
| "FINAL-Bench/Darwin-9B-NEG", | |
| torch_dtype=torch.bfloat16, | |
| device_map="auto", | |
| trust_remote_code=True, | |
| ) | |
| messages = [ | |
| {"role": "user", "content": "Solve: If f(x) = x³ − 3x + 2, find and classify all critical points."} | |
| ] | |
| text = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) | |
| inputs = tok(text, return_tensors="pt").to(model.device) | |
| outputs = model.generate(**inputs, max_new_tokens=2048, do_sample=False) | |
| print(tok.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True)) | |
| ``` | |
| ### Using the bundled NEG loader helper | |
| `modeling_darwin_neg.py` is shipped inside the repo and provides a convenience loader: | |
| ```python | |
| from modeling_darwin_neg import load_darwin_neg | |
| model = load_darwin_neg( | |
| "FINAL-Bench/Darwin-9B-NEG", | |
| hf_token="hf_xxx", | |
| ) | |
| ``` | |
| ### Mode selection | |
| - **Mode 1 (Pure NEG)**: default `do_sample=False`, NEG is always on. | |
| - **Mode 2 (Permutation)**: shuffle the option order 4 times, greedy each, majority-vote. | |
| - **Mode 3 (Ensemble)**: production protocol combining permutation, temperature sampling and second-opinion re-query (internal; reproduction scripts are released separately). | |
| --- | |
| ## 🧬 Model Lineage | |
| ``` | |
| Qwen/Qwen3.5-9B + (Opus-distilled sibling) | |
| ╲ ╱ | |
| Darwin V7 evolutionary merge | |
| ▼ | |
| Darwin-9B-Opus ── stand-alone reasoning model (Apache 2.0) | |
| ▼ | |
| NEG-Head / NEG-Gate training (Darwin V8) | |
| ▼ | |
| Darwin-9B-NEG ── THIS MODEL | |
| ``` | |
| - **Base**: [FINAL-Bench/Darwin-9B-Opus](https://huggingface.co/FINAL-Bench/Darwin-9B-Opus) (weights frozen during NEG training) | |
| - **Technology generation**: Darwin V8 (Native Entropy Gating) — successor to Darwin V7 (evolutionary merging) | |
| --- | |
| ## 🎯 Recommended Use-Cases | |
| - **Graduate-level STEM reasoning** — physics, chemistry, biology, mathematics (GPQA-style) | |
| - **Mathematical problem solving** (MATH, AIME-style) | |
| - **Code reasoning and debugging** (HumanEval-style) | |
| - **Complex chain-of-thought** tasks where a small reasoning model with a big boost is desired | |
| ## ⚠️ Limitations | |
| - Optimised for English first, with secondary support for Korean / Chinese / Japanese. | |
| - At 8.95 B parameters, knowledge coverage is smaller than the larger Darwin models (27B / 31B / 36B) — for pure world-knowledge tasks consider Darwin-36B-Opus. | |
| - The Ensemble mode (84.34 %) uses ≈ 20× inference; choose Pure NEG (mode 1) for cost-sensitive deployments. | |
| --- | |
| ## 📚 Citation | |
| ```bibtex | |
| @misc{darwin9b_neg_2026, | |
| title = {Darwin-9B-NEG: Native Entropy Gating for Self-Regulated Reasoning at 1x Inference Cost}, | |
| author = {FINAL-Bench / Darwin Research Team}, | |
| year = {2026}, | |
| howpublished = {\url{https://huggingface.co/FINAL-Bench/Darwin-9B-NEG}}, | |
| note = {Darwin V8 — Native Entropy Gating technology generation} | |
| } | |
| ``` | |
| --- | |
| ## 🔗 Related Darwin Models | |
| - **Darwin-36B-Opus** — MoE 36B, Qwen3.6-35B-A3B × Opus distilled, GPQA 88.4 % | |
| - **Darwin-31B-Opus** — 31B multilingual-strong reasoning | |
| - **Darwin-27B-Opus** — 27B dense, GPQA 86.9 % | |
| - **Darwin-28B-Opus** — Qwen3.6-27B × rico03 Opus distilled (new 2026-04) | |
| - **Darwin-9B-Opus** — this model's base, Qwen3.5-9B family | |
| - **Darwin-4B-Genesis** — smallest member, Gemma4 family | |
| --- | |
| This model is introduced in [Darwin Family](https://arxiv.org/abs/2605.14386). | |
| *Darwin V8 · Sealed 2026-04-24 · FINAL-Bench* | |