Stefan Schweter's picture

In a Training Loop 🔄

Stefan Schweter PRO

stefan-it

·

https://schweter.bayern

AI & ML interests

Flair Library 💕, NER & PoS Tagging, LM Pretraining (mostly encoder-only & encoder-decoder), Historical Language Models, German Language Models, Bavarian NLP 🥨, xLSTM

Recent Activity

liked a model 4 days ago

NX-AI/xlstm_scaling_laws

upvoted an article 5 days ago

How we OCR'ed 30,000 papers using Codex, open OCR models and Jobs

liked a model 9 days ago

netflix/void-model

View all activity

Organizations

upvoted an article 5 days ago

Article

How we OCR'ed 30,000 papers using Codex, open OCR models and Jobs

5 days ago

•

40

upvoted a collection 10 days ago

Gemma 4

8 items • Updated 10 days ago • 573

upvoted a collection 18 days ago

fiNERweb

A multilingual dataset for NER covering 91 langauges and 25 scripts • 3 items • Updated Dec 16, 2025 • 3

upvoted a paper 20 days ago

F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual World

Paper • 2603.19223 • Published 24 days ago • 31

upvoted 2 collections 20 days ago

Nemotron-Post-Training-v3

Collection of datasets used in the post-training phase of Nemotron Nano and Super v3. • 28 items • Updated 6 days ago • 119

Nemotron-Cascade 2

Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation • 4 items • Updated 6 days ago • 47

upvoted a changelog 23 days ago

Hugging Face Changelog

Protected Spaces with Public URLs

23 days ago

• 119

upvoted a collection 25 days ago

Olmo Hybrid

6 items • Updated Mar 5 • 24

upvoted a paper 25 days ago

Omnilingual MT: Machine Translation for 1,600 Languages

Paper • 2603.16309 • Published 26 days ago • 21

upvoted 2 articles 26 days ago

Article

State of Open Source on Hugging Face: Spring 2026

26 days ago

•

78

Article

Efficient LLM Pretraining: Packed Sequences and Masked Attention

Oct 7, 2024

•

70

upvoted 2 papers 26 days ago

Information Asymmetry across Language Varieties: A Case Study on Cantonese-Mandarin and Bavarian-German QA

Paper • 2603.14782 • Published 28 days ago • 1

Indirect Question Answering in English, German and Bavarian: A Challenging Task for High- and Low-Resource Languages Alike

Paper • 2603.15130 • Published 27 days ago • 1

upvoted a paper 27 days ago

Effective Distillation to Hybrid xLSTM Architectures

Paper • 2603.15590 • Published 27 days ago • 33

upvoted 2 articles about 1 month ago

Article

Ulysses Sequence Parallelism: Training with Million-Token Contexts

Mar 9

•

26

Article

FlashHead: Accelerating Language Model Inference ~ Efficient drop-in replacement for the classification head

Mar 11

•

2

upvoted a paper about 1 month ago

Flash-KMeans: Fast and Memory-Efficient Exact K-Means

Paper • 2603.09229 • Published Mar 10 • 82

upvoted a collection about 1 month ago

Nemotron-Pre-Training-Datasets

Large scale pre-training datasets used in the Nemotron family of models. • 12 items • Updated 6 days ago • 137

upvoted a paper about 1 month ago

Lost in Backpropagation: The LM Head is a Gradient Bottleneck

Paper • 2603.10145 • Published Mar 10 • 12

upvoted a collection about 1 month ago

NVIDIA Nemotron v3

Open, Production-ready Enterprise Models • 15 items • Updated 6 days ago • 265