-
Semlan112/Hug
Updated -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 26 -
MiniMaxAI/MiniMax-M2.5
Text Generation • 229B • Updated • 919k • • 1.46k -
Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled
Image-Text-to-Text • 28B • Updated • 573k • 2.73k
Collections
Discover the best community collections!
Collections including paper arxiv:1810.04805
-
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 20 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 26 -
Attention Is All You Need
Paper • 1706.03762 • Published • 120 -
Lookahead Anchoring: Preserving Character Identity in Audio-Driven Human Animation
Paper • 2510.23581 • Published • 42
-
Neural Machine Translation by Jointly Learning to Align and Translate
Paper • 1409.0473 • Published • 7 -
Attention Is All You Need
Paper • 1706.03762 • Published • 120 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 26 -
Hierarchical Reasoning Model
Paper • 2506.21734 • Published • 50
-
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
Paper • 2006.03654 • Published • 3 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 26 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper • 1907.11692 • Published • 10 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 20
-
High-Resolution Image Synthesis with Latent Diffusion Models
Paper • 2112.10752 • Published • 17 -
Adding Conditional Control to Text-to-Image Diffusion Models
Paper • 2302.05543 • Published • 58 -
Proximal Policy Optimization Algorithms
Paper • 1707.06347 • Published • 11 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 64
-
Semlan112/Hug
Updated -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 26 -
MiniMaxAI/MiniMax-M2.5
Text Generation • 229B • Updated • 919k • • 1.46k -
Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled
Image-Text-to-Text • 28B • Updated • 573k • 2.73k
-
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 20 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 26 -
Attention Is All You Need
Paper • 1706.03762 • Published • 120 -
Lookahead Anchoring: Preserving Character Identity in Audio-Driven Human Animation
Paper • 2510.23581 • Published • 42
-
High-Resolution Image Synthesis with Latent Diffusion Models
Paper • 2112.10752 • Published • 17 -
Adding Conditional Control to Text-to-Image Diffusion Models
Paper • 2302.05543 • Published • 58 -
Proximal Policy Optimization Algorithms
Paper • 1707.06347 • Published • 11 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 64
-
Neural Machine Translation by Jointly Learning to Align and Translate
Paper • 1409.0473 • Published • 7 -
Attention Is All You Need
Paper • 1706.03762 • Published • 120 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 26 -
Hierarchical Reasoning Model
Paper • 2506.21734 • Published • 50
-
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
Paper • 2006.03654 • Published • 3 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 26 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper • 1907.11692 • Published • 10 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 20