Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2506.07900

MiniCPM4: Ultra-Efficient LLMs on End Devices

MiniCPM4: Ultra-Efficient LLMs on End Devices

Paper • 2506.07900 • Published Jun 9, 2025 • 96
openbmb/MiniCPM-SALA

Text Generation • 9B • Updated 18 days ago • 1.03k • 673
openbmb/MiniCPM4.1-8B

Text Generation • 8B • Updated Oct 24, 2025 • 25.5k • 390
openbmb/MiniCPM4.1-8B-GGUF

Text Generation • 8B • Updated Sep 5, 2025 • 166 • 16

Grove MoE: Towards Efficient and Superior MoE LLMs with Adjugate Experts

Paper • 2508.07785 • Published Aug 11, 2025 • 29
MoBE: Mixture-of-Basis-Experts for Compressing MoE-based LLMs

Paper • 2508.05257 • Published Aug 7, 2025 • 13
SmallThinker: A Family of Efficient Large Language Models Natively Trained for Local Deployment

Paper • 2507.20984 • Published Jul 28, 2025 • 58
MiniCPM4: Ultra-Efficient LLMs on End Devices

Paper • 2506.07900 • Published Jun 9, 2025 • 96

MiniCPM4: Ultra-Efficient LLMs on End Devices

Paper • 2506.07900 • Published Jun 9, 2025 • 96

SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models

Paper • 2412.11605 • Published Dec 16, 2024 • 18
Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published Dec 13, 2024 • 108
Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization

Paper • 2412.17739 • Published Dec 23, 2024 • 41
SKETCH: Structured Knowledge Enhanced Text Comprehension for Holistic Retrieval

Paper • 2412.15443 • Published Dec 19, 2024 • 10

LLM Pruning and Distillation in Practice: The Minitron Approach

Paper • 2408.11796 • Published Aug 21, 2024 • 60
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering

Paper • 2408.09174 • Published Aug 17, 2024 • 53
To Code, or Not To Code? Exploring Impact of Code in Pre-training

Paper • 2408.10914 • Published Aug 20, 2024 • 45
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications

Paper • 2408.11878 • Published Aug 20, 2024 • 64

Packing Input Frame Context in Next-Frame Prediction Models for Video Generation

Paper • 2504.12626 • Published Apr 17, 2025 • 51
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14, 2025 • 339
Qwen-Image Technical Report

Paper • 2508.02324 • Published Aug 4, 2025 • 274
DINOv3

Paper • 2508.10104 • Published Aug 13, 2025 • 305

MiniCPM4: Ultra-Efficient LLMs on End Devices

Paper • 2506.07900 • Published Jun 9, 2025 • 96

Small Language Models (SLMs)

google/gemma-3n-E4B-it-litert-preview

Image-Text-to-Text • Updated May 26, 2025 • 1.48k
google/gemma-3n-E2B-it-litert-preview

Image-Text-to-Text • Updated May 20, 2025 • 579
openbmb/MiniCPM4-0.5B

Text Generation • 0.4B • Updated Oct 20, 2025 • 6.6k • 77
microsoft/Phi-4-mini-instruct

Text Generation • Updated Dec 10, 2025 • 1.34M • 725

interesting architecture

FAN: Fourier Analysis Networks

Paper • 2410.02675 • Published Oct 3, 2024 • 29
Tensor Product Attention Is All You Need

Paper • 2501.06425 • Published Jan 11, 2025 • 90
Scalable-Softmax Is Superior for Attention

Paper • 2501.19399 • Published Jan 31, 2025 • 25
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling

Paper • 2502.09509 • Published Feb 13, 2025 • 9

EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

Paper • 2402.17485 • Published Feb 27, 2024 • 194
InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding

Paper • 2403.15377 • Published Mar 22, 2024 • 29
nvidia/parakeet-tdt-0.6b-v2

Automatic Speech Recognition • Updated 7 days ago • 171k • 1.46k
google/gemma-3n-E4B-it-litert-preview

Image-Text-to-Text • Updated May 26, 2025 • 1.48k

MiniCPM4: Ultra-Efficient LLMs on End Devices

MiniCPM4: Ultra-Efficient LLMs on End Devices

Paper • 2506.07900 • Published Jun 9, 2025 • 96
openbmb/MiniCPM-SALA

Text Generation • 9B • Updated 18 days ago • 1.03k • 673
openbmb/MiniCPM4.1-8B

Text Generation • 8B • Updated Oct 24, 2025 • 25.5k • 390
openbmb/MiniCPM4.1-8B-GGUF

Text Generation • 8B • Updated Sep 5, 2025 • 166 • 16

Packing Input Frame Context in Next-Frame Prediction Models for Video Generation

Paper • 2504.12626 • Published Apr 17, 2025 • 51
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14, 2025 • 339
Qwen-Image Technical Report

Paper • 2508.02324 • Published Aug 4, 2025 • 274
DINOv3

Paper • 2508.10104 • Published Aug 13, 2025 • 305

Grove MoE: Towards Efficient and Superior MoE LLMs with Adjugate Experts

Paper • 2508.07785 • Published Aug 11, 2025 • 29
MoBE: Mixture-of-Basis-Experts for Compressing MoE-based LLMs

Paper • 2508.05257 • Published Aug 7, 2025 • 13
SmallThinker: A Family of Efficient Large Language Models Natively Trained for Local Deployment

Paper • 2507.20984 • Published Jul 28, 2025 • 58
MiniCPM4: Ultra-Efficient LLMs on End Devices

Paper • 2506.07900 • Published Jun 9, 2025 • 96

MiniCPM4: Ultra-Efficient LLMs on End Devices

Paper • 2506.07900 • Published Jun 9, 2025 • 96

MiniCPM4: Ultra-Efficient LLMs on End Devices

Paper • 2506.07900 • Published Jun 9, 2025 • 96

Small Language Models (SLMs)

google/gemma-3n-E4B-it-litert-preview

Image-Text-to-Text • Updated May 26, 2025 • 1.48k
google/gemma-3n-E2B-it-litert-preview

Image-Text-to-Text • Updated May 20, 2025 • 579
openbmb/MiniCPM4-0.5B

Text Generation • 0.4B • Updated Oct 20, 2025 • 6.6k • 77
microsoft/Phi-4-mini-instruct

Text Generation • Updated Dec 10, 2025 • 1.34M • 725

SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models

Paper • 2412.11605 • Published Dec 16, 2024 • 18
Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published Dec 13, 2024 • 108
Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization

Paper • 2412.17739 • Published Dec 23, 2024 • 41
SKETCH: Structured Knowledge Enhanced Text Comprehension for Holistic Retrieval

Paper • 2412.15443 • Published Dec 19, 2024 • 10

interesting architecture

FAN: Fourier Analysis Networks

Paper • 2410.02675 • Published Oct 3, 2024 • 29
Tensor Product Attention Is All You Need

Paper • 2501.06425 • Published Jan 11, 2025 • 90
Scalable-Softmax Is Superior for Attention

Paper • 2501.19399 • Published Jan 31, 2025 • 25
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling

Paper • 2502.09509 • Published Feb 13, 2025 • 9

LLM Pruning and Distillation in Practice: The Minitron Approach

Paper • 2408.11796 • Published Aug 21, 2024 • 60
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering

Paper • 2408.09174 • Published Aug 17, 2024 • 53
To Code, or Not To Code? Exploring Impact of Code in Pre-training

Paper • 2408.10914 • Published Aug 20, 2024 • 45
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications

Paper • 2408.11878 • Published Aug 20, 2024 • 64

EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

Paper • 2402.17485 • Published Feb 27, 2024 • 194
InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding

Paper • 2403.15377 • Published Mar 22, 2024 • 29
nvidia/parakeet-tdt-0.6b-v2

Automatic Speech Recognition • Updated 7 days ago • 171k • 1.46k
google/gemma-3n-E4B-it-litert-preview

Image-Text-to-Text • Updated May 26, 2025 • 1.48k

Previous
1
2
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs