Jhon
sabowsla
ยท
AI & ML interests
None yet
Recent Activity
reacted to SeaWolf-AI's post with ๐ฅ about 4 hours ago
๐งฌ Darwin-27B-Opus: 86.9% on GPQA Diamond โ World #5, Zero Training
We are excited to share Darwin-27B-Opus, a 27B model that achieved 86.9% on GPQA Diamond โ ranking #5 globally on the HuggingFace leaderboard โ without a single gradient update.
How? Darwin breeds pretrained models through evolutionary FFN crossbreeding. The father (Qwen3.5-27B) provides the reasoning architecture; the mother (Claude 4.6 Opus Reasoning Distilled) contributes structured chain-of-thought knowledge. CMA-ES automatically discovers optimal per-layer blending ratios โ no human tuning required.
The result surpasses the original Qwen3.5-27B (85.5%), GLM-5.1 (744B, 86.2%), and Qwen3.5-122B (86.6%). A 27B model outperforming 744B โ with zero training, zero data, one GPU, ~2 hours.
We also confirmed hybrid vigor on Korean benchmarks: Darwin-27B-KR (2nd generation offspring) surpassed both parents on CLIcK, winning 7 out of 11 categories. The evolutionary optimizer independently assigned 93% of FFN from the Korean-specialized mother while preserving 93% of attention from the reasoning-specialized father โ autonomously validating our core principle: FFN carries knowledge, Attention carries reasoning.
๐ Public release: 10 days โ 300+ community derivatives, 120K+ downloads.
๐ Links:
Darwin-27B-Opus: https://huggingface.co/FINAL-Bench/Darwin-27B-Opus
article: https://huggingface.co/blog/FINAL-Bench/darwin-gpqa
Darwin Family Collection: https://huggingface.co/collections/FINAL-Bench/darwin-family
If foundation models are raw ore, Darwin is the forge. We are just getting started. ๐ฅ new activity about 15 hours ago
mlx-community/gemma-4-26B-A4B-it-heretic-4bit:Can we get it on 2bit? reacted to SeaWolf-AI's post with ๐ 5 days ago
๐งฌ Darwin V6: Diagnostic-Guided Evolutionary Model Merging
We are releasing Darwin-31B-Opus โ a reasoning-enhanced model merging Google's Gemma-4-31B-it and TeichAI's Claude Opus Distill using the Darwin V6 engine.
Model: https://huggingface.co/FINAL-Bench/Darwin-31B-Opus
Demo: https://huggingface.co/spaces/FINAL-Bench/Darwin-31B-Opus
๐ฌ What Darwin V6 Does
Conventional merging tools (mergekit, etc.) apply a single ratio to all tensors. Set ratio=0.5 and all 1,188 tensors blend identically, with no distinction between which tensors matter for reasoning versus coding.
Darwin V6 diagnoses both parents at the tensor level before merging. It measures Shannon entropy, standard deviation, and L2 norm for every tensor, then passes 5 diagnostic probes (REASONING, CODE, MATH, KNOWLEDGE, LANGUAGE) through the model to determine layer-wise functional importance. Each of the 1,188 tensors receives an independent optimal ratio.
combined = static(entropy/std/norm) x 0.4 + probe(cosine_distance) x 0.6
final_ratio = mri_ratio x mri_trust + genome_ratio x (1 - mri_trust)
When one parent is overwhelmingly superior for a tensor (ratio < 0.15 or > 0.85), Darwin transplants it directly without interpolation. The mri_trust parameter itself is optimized by CMA-ES evolutionary search, so optimal transplant intensity is determined automatically. After merging, a Health Check compares the child against both parents layer-by-layer to detect interference or function loss.
๐งฌ Parent Models
Father: google/gemma-4-31B-it
Mother: TeichAI/gemma-4-31B-it-Claude-Opus-Distill
๐งฌ Results
Compared under identical conditions (same 50 questions, same seed, greedy, thinking mode):
Father: 60.0% (30/50)
Darwin-31B-Opus: 66.0% (33/50) โ +10% relative improvement
ARC-Challenge: 82.89% (loglikelihood, zero-shot, 200 questions)
Optimal genome found by evolution:
ffn_ratio=0.93 โ FFN layers strongly favor Mother (Claude Opus Distill)
block_5 (L50-L59)=0.86 and more...Organizations
None yet