MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU Paper • 2604.05091 • Published 10 days ago • 45
TriAttention: Efficient Long Reasoning with Trigonometric KV Compression Paper • 2604.04921 • Published 10 days ago • 107
Mistral Small 4 Collection A state-of-the-art model, open-weight, with a granular Mixture-of-Experts architecture that fuses instruct, reasoning and agentic skills. • 3 items • Updated about 1 month ago • 66
OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration Paper • 2602.05400 • Published Feb 5 • 352
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published Jan 8 • 230
Parallax: Efficient LLM Inference Service over Decentralized Environment Paper • 2509.26182 • Published Sep 30, 2025 • 1
Audio2Face-3D Collection Open-weight Audio2Face-3D and Audio2Emotion networks and a sample dataset for training and evaluation • 7 items • Updated about 22 hours ago • 17
INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats Paper • 2510.25602 • Published Oct 29, 2025 • 80
Cerebras REAP Collection Sparse MoE models compressed using REAP (Router-weighted Expert Activation Pruning) method • 30 items • Updated Feb 25 • 136
SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights Paper • 2509.22944 • Published Sep 26, 2025 • 81
Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models Paper • 2510.04618 • Published Oct 6, 2025 • 131