[to-read]
updated
A Survey of Small Language Models
Paper
• 2410.20011
• Published • 46
TokenFormer: Rethinking Transformer Scaling with Tokenized Model
Parameters
Paper
• 2410.23168
• Published • 24
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A
Gradient Perspective
Paper
• 2410.23743
• Published • 64
GPT or BERT: why not both?
Paper
• 2410.24159
• Published • 14
Physics in Next-token Prediction
Paper
• 2411.00660
• Published • 14
PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance
Paper
• 2411.02327
• Published • 11
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models
Paper
• 2411.04905
• Published • 127
Hymba: A Hybrid-head Architecture for Small Language Models
Paper
• 2411.13676
• Published • 48
Paper
• 2410.21276
• Published • 87
Transformers without Normalization
Paper
• 2503.10622
• Published • 172