-
Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights
Paper • 2603.12228 • Published • 12 -
Meta-Reinforcement Learning with Self-Reflection for Agentic Search
Paper • 2603.11327 • Published • 10 -
Training Language Models via Neural Cellular Automata
Paper • 2603.10055 • Published • 8 -
Attention Sinks Are Provably Necessary in Softmax Transformers: Evidence from Trigger-Conditional Tasks
Paper • 2603.11487 • Published • 2
Collections
Discover the best community collections!
Collections including paper arxiv:2603.12228
-
AgentConductor: Topology Evolution for Multi-Agent Competition-Level Code Generation
Paper • 2602.17100 • Published • 4 -
GroupGPT: A Token-efficient and Privacy-preserving Agentic Framework for Multi-User Chat Assistant
Paper • 2603.01059 • Published • 1 -
Multi-Domain Riemannian Graph Gluing for Building Graph Foundation Models
Paper • 2603.00618 • Published -
Heterogeneous Agent Collaborative Reinforcement Learning
Paper • 2603.02604 • Published • 194
-
Adapters: A Unified Library for Parameter-Efficient and Modular Transfer Learning
Paper • 2311.11077 • Published • 29 -
Tensor Product Attention Is All You Need
Paper • 2501.06425 • Published • 90 -
LoRA: Low-Rank Adaptation of Large Language Models
Paper • 2106.09685 • Published • 60 -
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect
Paper • 2403.03853 • Published • 66
-
OpenClaw-RL: Train Any Agent Simply by Talking
Paper • 2603.10165 • Published • 151 -
Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights
Paper • 2603.12228 • Published • 12 -
Efficient Memory Management for Large Language Model Serving with PagedAttention
Paper • 2309.06180 • Published • 53 -
1-bit AI Infra: Part 1.1, Fast and Lossless BitNet b1.58 Inference on CPUs
Paper • 2410.16144 • Published • 5
-
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
Paper • 2503.18878 • Published • 120 -
Truth Neurons
Paper • 2505.12182 • Published • 8 -
Resa: Transparent Reasoning Models via SAEs
Paper • 2506.09967 • Published • 22 -
Why Can't Transformers Learn Multiplication? Reverse-Engineering Reveals Long-Range Dependency Pitfalls
Paper • 2510.00184 • Published • 17
-
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 107 -
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Paper • 2310.11511 • Published • 80 -
In-Context Learning Creates Task Vectors
Paper • 2310.15916 • Published • 43 -
Matryoshka Diffusion Models
Paper • 2310.15111 • Published • 45
-
Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights
Paper • 2603.12228 • Published • 12 -
Meta-Reinforcement Learning with Self-Reflection for Agentic Search
Paper • 2603.11327 • Published • 10 -
Training Language Models via Neural Cellular Automata
Paper • 2603.10055 • Published • 8 -
Attention Sinks Are Provably Necessary in Softmax Transformers: Evidence from Trigger-Conditional Tasks
Paper • 2603.11487 • Published • 2
-
OpenClaw-RL: Train Any Agent Simply by Talking
Paper • 2603.10165 • Published • 151 -
Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights
Paper • 2603.12228 • Published • 12 -
Efficient Memory Management for Large Language Model Serving with PagedAttention
Paper • 2309.06180 • Published • 53 -
1-bit AI Infra: Part 1.1, Fast and Lossless BitNet b1.58 Inference on CPUs
Paper • 2410.16144 • Published • 5
-
AgentConductor: Topology Evolution for Multi-Agent Competition-Level Code Generation
Paper • 2602.17100 • Published • 4 -
GroupGPT: A Token-efficient and Privacy-preserving Agentic Framework for Multi-User Chat Assistant
Paper • 2603.01059 • Published • 1 -
Multi-Domain Riemannian Graph Gluing for Building Graph Foundation Models
Paper • 2603.00618 • Published -
Heterogeneous Agent Collaborative Reinforcement Learning
Paper • 2603.02604 • Published • 194
-
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
Paper • 2503.18878 • Published • 120 -
Truth Neurons
Paper • 2505.12182 • Published • 8 -
Resa: Transparent Reasoning Models via SAEs
Paper • 2506.09967 • Published • 22 -
Why Can't Transformers Learn Multiplication? Reverse-Engineering Reveals Long-Range Dependency Pitfalls
Paper • 2510.00184 • Published • 17
-
Adapters: A Unified Library for Parameter-Efficient and Modular Transfer Learning
Paper • 2311.11077 • Published • 29 -
Tensor Product Attention Is All You Need
Paper • 2501.06425 • Published • 90 -
LoRA: Low-Rank Adaptation of Large Language Models
Paper • 2106.09685 • Published • 60 -
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect
Paper • 2403.03853 • Published • 66
-
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 107 -
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Paper • 2310.11511 • Published • 80 -
In-Context Learning Creates Task Vectors
Paper • 2310.15916 • Published • 43 -
Matryoshka Diffusion Models
Paper • 2310.15111 • Published • 45