LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training Paper ⢠2406.16554 ⢠Published Jun 24, 2024 ⢠1
Adaptive Fast-and-Slow Visual Program Reasoning for Long-Form VideoQA Paper ⢠2509.17743 ⢠Published Sep 22, 2025
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Paper ⢠2511.04570 ⢠Published Nov 6, 2025 ⢠242
OpenNovelty: An LLM-powered Agentic System for Verifiable Scholarly Novelty Assessment Paper ⢠2601.01576 ⢠Published Jan 4 ⢠19
Beyond Scaling: Measuring and Predicting the Upper Bound of Knowledge Retention in Language Model Pre-Training Paper ⢠2502.04066 ⢠Published Feb 6, 2025
LLMEval-Fair: A Large-Scale Longitudinal Study on Robust and Fair Evaluation of Large Language Models Paper ⢠2508.05452 ⢠Published Aug 7, 2025
LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation Paper ⢠2506.04078 ⢠Published Jun 4, 2025 ⢠1
MOVA: Towards Scalable and Synchronized Video-Audio Generation Paper ⢠2602.08794 ⢠Published Feb 9 ⢠159
SciAgentGym: Benchmarking Multi-Step Scientific Tool-use in LLM Agents Paper ⢠2602.12984 ⢠Published Feb 13 ⢠7
LatentMem: Customizing Latent Memory for Multi-Agent Systems Paper ⢠2602.03036 ⢠Published Feb 3 ⢠15
LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training Paper ⢠2411.15708 ⢠Published Nov 24, 2024
Iterative Value Function Optimization for Guided Decoding Paper ⢠2503.02368 ⢠Published Mar 4, 2025 ⢠15
Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts Paper ⢠2503.05447 ⢠Published Mar 7, 2025 ⢠8
Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models Paper ⢠2503.16779 ⢠Published Mar 21, 2025 ⢠1
Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts Paper ⢠2406.11256 ⢠Published Jun 17, 2024
Speed Always Wins: A Survey on Efficient Architectures for Large Language Models Paper ⢠2508.09834 ⢠Published Aug 13, 2025 ⢠53
DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models Paper ⢠2512.24165 ⢠Published Dec 30, 2025 ⢠52
DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models Paper ⢠2512.24165 ⢠Published Dec 30, 2025 ⢠52
VA-Ļ: Variational Policy Alignment for Pixel-Aware Autoregressive Generation Paper ⢠2512.19680 ⢠Published Dec 22, 2025 ⢠12