5 4 2

Demian L. P.

very-cooluser

AI & ML interests

Anything that can run on ~3GB of memory is a instant thumbs up to me

Recent Activity

reacted to SeaWolf-AI's post with 🔥 about 19 hours ago

🧬 Darwin-27B-Opus: 86.9% on GPQA Diamond — World #5, Zero Training We are excited to share Darwin-27B-Opus, a 27B model that achieved 86.9% on GPQA Diamond — ranking #5 globally on the HuggingFace leaderboard — without a single gradient update. How? Darwin breeds pretrained models through evolutionary FFN crossbreeding. The father (Qwen3.5-27B) provides the reasoning architecture; the mother (Claude 4.6 Opus Reasoning Distilled) contributes structured chain-of-thought knowledge. CMA-ES automatically discovers optimal per-layer blending ratios — no human tuning required. The result surpasses the original Qwen3.5-27B (85.5%), GLM-5.1 (744B, 86.2%), and Qwen3.5-122B (86.6%). A 27B model outperforming 744B — with zero training, zero data, one GPU, ~2 hours. We also confirmed hybrid vigor on Korean benchmarks: Darwin-27B-KR (2nd generation offspring) surpassed both parents on CLIcK, winning 7 out of 11 categories. The evolutionary optimizer independently assigned 93% of FFN from the Korean-specialized mother while preserving 93% of attention from the reasoning-specialized father — autonomously validating our core principle: FFN carries knowledge, Attention carries reasoning. 📊 Public release: 10 days → 300+ community derivatives, 120K+ downloads. 🔗 Links: Darwin-27B-Opus: https://huggingface.co/FINAL-Bench/Darwin-27B-Opus article: https://huggingface.co/blog/FINAL-Bench/darwin-gpqa Darwin Family Collection: https://huggingface.co/collections/FINAL-Bench/darwin-family If foundation models are raw ore, Darwin is the forge. We are just getting started. 🔥

reacted to anakin87's post with ❤️ 2 days ago

🌀 Let LLMs wander - Engineering RL Environments Reinforcement Learning Environments are little worlds where models can act, get rewards, and learn. I've been exploring how to design them, figuring out what works and what doesn't. If you want to learn how to build them, I recorded a practical intro video. You'll also see how to turn Liquid AI LFM2-2.6B into a Tic-tac-toe master 🙂 🎥 Engineering RL Environments video: https://www.youtube.com/watch?v=71V3fTaUp2Q --- 🌱 LLM RL Environments Lil Course: https://github.com/anakin87/llm-rl-environments-lil-course 🤗🕹️ Play against the trained model: https://huggingface.co/spaces/anakin87/LFM2-2.6B-mr-tictactoe 📚 HF collection (datasets + models): https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe

reacted to Shrijanagain's post with 🔥 26 days ago

Surya-1.1T: Scaling Beyond Human-Level Reasoning via 146 Trillion Token Pre-training Author: SKT AI LABS Affiliation: SKT AI Labs / Project Surya Model Architecture: Optimized Dense Transformer Parameters: 1.1 Trillion Training Tokens: 146 Trillion Wanna collaborate us Friends let's Start Journey we have Collected 146 trillon tokens and done pre training but we need to made more powerfull Whitepaper - https://github.com/SHRIJANAGAIN/PROFF

View all activity

Organizations

None yet

upvoted a collection about 1 month ago

Qwen3-VL

Collection

37 items • Updated Dec 31, 2025 • 691

upvoted 3 papers 3 months ago

Qwen3-TTS Technical Report

Paper • 2601.15621 • Published Jan 22 • 74

AT^2PO: Agentic Turn-based Policy Optimization via Tree Search

Paper • 2601.04767 • Published Jan 8 • 28

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

Paper • 2601.05242 • Published Jan 8 • 230

Demian L. P.

AI & ML interests

Recent Activity

Organizations

very-cooluser's activity