1 4 17

Mike White

seleven11

AI & ML interests

None yet

Recent Activity

upvoted a paper 28 days ago

Mixture-of-Depths Attention

upvoted a paper 28 days ago

Attention Residuals

liked a dataset 2 months ago

LLM360/guru-RL-92k

View all activity

Organizations

None yet

upvoted 2 papers 28 days ago

Mixture-of-Depths Attention

Paper • 2603.15619 • Published 28 days ago • 80

Attention Residuals

Paper • 2603.15031 • Published 29 days ago • 179

liked a dataset 2 months ago

LLM360/guru-RL-92k

Viewer • Updated Aug 20, 2025 • 91.9k • 1.77k • 46

liked 3 datasets 3 months ago

liked 2 datasets 5 months ago

HuggingFaceTB/smollm-corpus

Viewer • Updated Sep 6, 2024 • 237M • 42.8k • 448

Leon-Leee/unofficial-pyedu

Viewer • Updated Mar 12, 2025 • 7.68M • 272 • 4

upvoted an article 5 months ago

Article

SmolLM - blazingly fast and remarkably powerful

Jul 16, 2024

•

450

liked a Space 5 months ago

The Smol Training Playbook

📚

3.1k

The secrets to building world-class LLMs

liked 3 datasets 6 months ago

m-a-p/COIG-CQIA

Viewer • Updated Apr 18, 2024 • 44.7k • 8.72k • 711

BAAI/COIG

Viewer • Updated Jul 12, 2023 • 276k • 547 • 458

YeungNLP/firefly-train-1.1M

Viewer • Updated Apr 10, 2023 • 1.65M • 1.41k • 340

upvoted an article 8 months ago

Article

Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment

Feb 11, 2025

•

117

liked a Space 10 months ago

Predict Memory

🧮

107

Calculate and visualize memory usage for model training

liked a Space about 1 year ago

The Ultra-Scale Playbook

🌌

3.78k

The ultimate guide to training LLM on large GPU Clusters

liked 2 models over 1 year ago

Qwen/Qwen2-7B-Instruct

Text Generation • 8B • Updated Aug 21, 2024 • 435k • • 686

Alibaba-NLP/gte-Qwen2-7B-instruct

liked a model almost 2 years ago

Qwen/Qwen2-72B-Instruct

Text Generation • 73B • Updated Oct 8, 2024 • 70.1k • • 718

liked a model over 2 years ago

meta-llama/Llama-2-13b-hf

Text Generation • Updated Apr 17, 2024 • 25.7k • 622