๐งฌ Darwin Family: Zero Gradient Steps, GPQA Diamond 88.89%
How far can we push LLM reasoning *without* training?
Our team at VIDRAFT submitted this paper to Daily Papers yesterday, and it's currently #3. Huge thanks to everyone who upvoted โ sharing the core ideas below.
Darwin Family is a training-free evolutionary merging framework. By recombining the weight spaces of existing LLM checkpoints โ with zero gradient-based training โ it reaches frontier-level reasoning.
- ๐ Darwin-28B-Opus: GPQA Diamond 88.89% - ๐ธ Zero gradient steps โ not a single B200 or H200 hour needed - ๐งฌ Consistent gains across 4B โ 35B scale - ๐ Cross-architecture breeding between Transformer and Mamba families - ๐ Stable recursive multi-generation evolution
#Three Core Mechanisms
โ 14-dim Adaptive Merge Genome โ fine-grained recombination at both component level (Attention / FFN / MLP / LayerNorm / Embedding) and block level, expanding the prior evolutionary-merge search space.
โก MRI-Trust Fusion โ we diagnose each layer's reasoning contribution via an **MRI (Model Reasoning Importance)** signal and fuse it with evolutionary search through a **learnable trust parameter**. Trust the diagnostic too much and search collapses; ignore it and search becomes inefficient โ Darwin learns the balance from data.
We evaluated 9 SOTA models (GPT-5.2, Claude Opus 4.6, Gemini 3 Pro, etc.) across 1,800 assessments in FINAL Bench and found a 39.2%p gap between "recognizing potential errors (MA=0.694)" and "actually finding and fixing them (ER=0.302)."
MARL (Model-Agnostic Runtime Middleware for LLMs) was built to close this metacognitive gap. It decomposes a single LLM call into a 5-stage expert pipeline (Hypothesis โ Solver โ Auditor โ Adversarial Verifier โ Synthesizer), transforming "answer in one shot" into "think, doubt, correct, and rewrite."
No weight modification โ works instantly with GPT-5.4, Claude, Gemini, Llama, or any OpenAI API-compatible LLM by changing one line: base_url. Ships with 9 domain-specific emergence engines (invention, pharma, genomics, chemistry, ecology, law, and more โ 5,538 expert data items) activated by a simple tag like model="gpt-5.4::pharma".
pip install marl-middleware
MARL is also officially registered on ClawHub, the skill marketplace of OpenClaw โ an AI agent platform with 260K+ developers and 3,200+ skills. It's the first middleware in the Reasoning Enhancement category. One command โ clawhub install marl-middleware โ gives your AI agent a metacognition upgrade.