Steffen Röcker's picture

Steffen Röcker PRO

sroecker

·

https://x.com/sroecker

AI & ML interests

Local models

Recent Activity

liked a dataset about 7 hours ago

ScaleAI/SWE-bench_Pro

liked a model about 13 hours ago

JANGQ-AI/MiniMax-M2.7-JANG_2L

liked a model 2 days ago

YTan2000/Qwopus3.5-27B-v3-Abliterated-TQ3_4S

View all activity

Organizations

upvoted a changelog 5 days ago

Hugging Face Changelog

Agent Traces on the Hub

5 days ago

• 93

upvoted a collection 5 days ago

GLM-5.1

2 items • Updated 5 days ago • 48

upvoted a collection 10 days ago

Gemma 4

8 items • Updated 10 days ago • 573

upvoted a collection 16 days ago

Nemotron-Post-Training-v3

Collection of datasets used in the post-training phase of Nemotron Nano and Super v3. • 28 items • Updated 6 days ago • 119

upvoted a collection 19 days ago

Open Pangram

Open models and datasets based on Pangram's ICLR 2026 EditLens paper licensed for noncommercial use ONLY under CC BY-NC-SA 4.0 • 4 items • Updated 19 days ago • 8

upvoted a collection 21 days ago

CodeScout

RL-trained code search agents (1.7B, 4B, 14B) that outperform 2–18× larger models using only a Unix terminal. 📄 arxiv.org/abs/2603.17829 • 12 items • Updated 25 days ago • 7

upvoted an article about 1 month ago

Article

Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries

+7

Mar 10

•

124

upvoted 2 collections about 1 month ago

Distil Efficiency Benchmarks

Collection of models used in the blog post www.distillabs.ai/blog/the-10x-inference-tax-you-dont-have-to-pay • 9 items • Updated Mar 2 • 3

Quantized Qwen3.5

Verified models. Compatible with Transformers v5.3 and vLLM v0.16.1rc1 (nightly). Under evaluation. • 9 items • Updated Mar 12 • 9

upvoted 5 collections about 2 months ago

REAM

Compressed MoE models with a reduced number of experts. See additional models at https://huggingface.co/bknyaz. • 11 items • Updated 3 days ago • 5

Cerebras REAP

Sparse MoE models compressed using REAP (Router-weighted Expert Activation Pruning) method • 30 items • Updated Feb 25 • 135

Qwen3.5

21 items • Updated Mar 9 • 1.49k

gliner2 family

GLiNER2 extends the original GLiNER architecture to support multi-task information extraction with a schema-driven interface. This base model provid • 4 items • Updated Feb 10 • 42

QED Nano

Artifacts for the QED Nano release • 9 items • Updated Mar 2 • 9

upvoted an article 2 months ago

Article

From Golden Gate Bridge to Broken JSON: Why Anthropic's SAE Steering Fails for Structured Output

Feb 7

•

22

upvoted a collection 3 months ago

ZeroShot

Collection of LLMs finetuned for bughunting • 2 items • Updated Jan 20 • 2

upvoted an article 4 months ago

Article

Tensor Parallelism (TP) in Transformers: 5 Minutes to Understand

Dec 4, 2025

•

68

upvoted an article 5 months ago

Article

Continuous batching from first principles

+1

Nov 25, 2025

•

356

upvoted 2 collections 5 months ago

Speculator Models

16 items • Updated 2 days ago • 14

The Bestiary

Decensored language models made using Heretic (https://github.com/p-e-w/heretic) • 6 items • Updated Nov 16, 2025 • 110