Florian Zimmermeister's picture

Hiring 💼

Florian Zimmermeister

flozi00

·

AI & ML interests

ASR, German LLM

Recent Activity

liked a model about 5 hours ago

openbmb/VoxCPM2

liked a model 4 days ago

MiniMaxAI/MiniMax-M2.7

upvoted a paper 6 days ago

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

View all activity

Organizations

$A\\Ware's profile picture$

upvoted a paper 6 days ago

MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU

Paper • 2604.05091 • Published 10 days ago • 45

upvoted a paper 7 days ago

TriAttention: Efficient Long Reasoning with Trigonometric KV Compression

Paper • 2604.04921 • Published 10 days ago • 107

upvoted a collection about 1 month ago

Mistral Small 4

A state-of-the-art model, open-weight, with a granular Mixture-of-Experts architecture that fuses instruct, reasoning and agentic skills. • 3 items • Updated about 1 month ago • 66

upvoted an article about 1 month ago

Article

Spend 80% of Your LLM Compute on Data, Not Training

Feb 14

•

2

upvoted a collection about 1 month ago

Qwen3.5

21 items • Updated Mar 9 • 1.52k

upvoted a paper about 2 months ago

OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration

Paper • 2602.05400 • Published Feb 5 • 352

upvoted an article 2 months ago

Article

Open Responses: What you need to know

+2

Jan 15

•

111

upvoted an article 3 months ago

Article

We Got Claude to Build CUDA Kernels and teach open models!

+2

Jan 28

•

154

upvoted 3 papers 3 months ago

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

Paper • 2601.05242 • Published Jan 8 • 230

Recursive Language Models

Paper • 2512.24601 • Published Dec 31, 2025 • 94

mHC: Manifold-Constrained Hyper-Connections

Paper • 2512.24880 • Published Dec 31, 2025 • 322

upvoted a paper 4 months ago

Parallax: Efficient LLM Inference Service over Decentralized Environment

Paper • 2509.26182 • Published Sep 30, 2025 • 1

upvoted a collection 4 months ago

Audio2Face-3D

Open-weight Audio2Face-3D and Audio2Emotion networks and a sample dataset for training and evaluation • 7 items • Updated about 22 hours ago • 17

upvoted 2 articles 5 months ago

Article

Continuous batching from first principles

+1

Nov 25, 2025

•

357

Article

🌳 QAT: The Art of Growing a Bonsai Model

Nov 9, 2025

•

15

upvoted 2 papers 5 months ago

INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats

Paper • 2510.25602 • Published Oct 29, 2025 • 80

Diffusion Language Models are Super Data Learners

Paper • 2511.03276 • Published Nov 5, 2025 • 132

upvoted a collection 6 months ago

Cerebras REAP

Sparse MoE models compressed using REAP (Router-weighted Expert Activation Pruning) method • 30 items • Updated Feb 25 • 136

upvoted 2 papers 6 months ago

SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights

Paper • 2509.22944 • Published Sep 26, 2025 • 81

Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models

Paper • 2510.04618 • Published Oct 6, 2025 • 131