Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2506.20920

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Paper • 2406.17557 • Published Jun 25, 2024 • 102
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26, 2025 • 78

Reasoning Introduces New Poisoning Attacks Yet Makes Them More Complicated

Paper • 2509.05739 • Published Sep 6, 2025 • 2
Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers

Paper • 2509.03059 • Published Sep 3, 2025 • 25
Universal Deep Research: Bring Your Own Model and Strategy

Paper • 2509.00244 • Published Aug 29, 2025 • 14
<think> So let's replace this phrase with insult... </think> Lessons learned from generation of toxic texts with LLMs

Paper • 2509.08358 • Published Sep 10, 2025 • 13

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26, 2025 • 78

MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings

Paper • 2405.19504 • Published May 29, 2024 • 3
HiWave: Training-Free High-Resolution Image Generation via Wavelet-Based Diffusion Sampling

Paper • 2506.20452 • Published Jun 25, 2025 • 18
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26, 2025 • 78
The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm

Paper • 2507.18553 • Published Jul 24, 2025 • 41

GR00T N1: An Open Foundation Model for Generalist Humanoid Robots

Paper • 2503.14734 • Published Mar 18, 2025 • 7
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation

Paper • 2401.02117 • Published Jan 4, 2024 • 33
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

Paper • 2506.01844 • Published Jun 2, 2025 • 158
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding

Paper • 2506.16035 • Published Jun 19, 2025 • 89

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26, 2025 • 78
HuggingFaceFW/finewiki

Viewer • Updated Oct 22, 2025 • 61.6M • 6.73k • 292
nhagar/fineweb_urls

Viewer • Updated May 15, 2025 • 24.5B • 1.18k • 2
PleIAs/common_corpus

Viewer • Updated Feb 19 • 69.9k • 185k • 390

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26, 2025 • 78
SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7, 2025 • 207
The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset

Paper • 2303.03915 • Published Mar 7, 2023 • 7
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4, 2025 • 258

ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation

Paper • 2506.18095 • Published Jun 22, 2025 • 66
FreedomIntelligence/ShareGPT-4o-Image

Viewer • Updated Jul 1, 2025 • 92.3k • 652 • 97
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26, 2025 • 78

Hugging Face Science team papers

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26, 2025 • 78
SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7, 2025 • 207
YourBench: Easy Custom Evaluation Sets for Everyone

Paper • 2504.01833 • Published Apr 2, 2025 • 23
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4, 2025 • 258

Training optimization

The Curse of Depth in Large Language Models

Paper • 2502.05795 • Published Feb 9, 2025 • 40
Transformers without Normalization

Paper • 2503.10622 • Published Mar 13, 2025 • 172
Parallel Scaling Law for Language Models

Paper • 2505.10475 • Published May 15, 2025 • 83
Learning to Skip the Middle Layers of Transformers

Paper • 2506.21103 • Published Jun 26, 2025 • 18

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Paper • 2406.17557 • Published Jun 25, 2024 • 102
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26, 2025 • 78

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26, 2025 • 78
HuggingFaceFW/finewiki

Viewer • Updated Oct 22, 2025 • 61.6M • 6.73k • 292
nhagar/fineweb_urls

Viewer • Updated May 15, 2025 • 24.5B • 1.18k • 2
PleIAs/common_corpus

Viewer • Updated Feb 19 • 69.9k • 185k • 390

Reasoning Introduces New Poisoning Attacks Yet Makes Them More Complicated

Paper • 2509.05739 • Published Sep 6, 2025 • 2
Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers

Paper • 2509.03059 • Published Sep 3, 2025 • 25
Universal Deep Research: Bring Your Own Model and Strategy

Paper • 2509.00244 • Published Aug 29, 2025 • 14
<think> So let's replace this phrase with insult... </think> Lessons learned from generation of toxic texts with LLMs

Paper • 2509.08358 • Published Sep 10, 2025 • 13

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26, 2025 • 78
SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7, 2025 • 207
The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset

Paper • 2303.03915 • Published Mar 7, 2023 • 7
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4, 2025 • 258

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26, 2025 • 78

ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation

Paper • 2506.18095 • Published Jun 22, 2025 • 66
FreedomIntelligence/ShareGPT-4o-Image

Viewer • Updated Jul 1, 2025 • 92.3k • 652 • 97
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26, 2025 • 78

MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings

Paper • 2405.19504 • Published May 29, 2024 • 3
HiWave: Training-Free High-Resolution Image Generation via Wavelet-Based Diffusion Sampling

Paper • 2506.20452 • Published Jun 25, 2025 • 18
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26, 2025 • 78
The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm

Paper • 2507.18553 • Published Jul 24, 2025 • 41

Hugging Face Science team papers

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26, 2025 • 78
SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7, 2025 • 207
YourBench: Easy Custom Evaluation Sets for Everyone

Paper • 2504.01833 • Published Apr 2, 2025 • 23
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4, 2025 • 258

GR00T N1: An Open Foundation Model for Generalist Humanoid Robots

Paper • 2503.14734 • Published Mar 18, 2025 • 7
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation

Paper • 2401.02117 • Published Jan 4, 2024 • 33
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

Paper • 2506.01844 • Published Jun 2, 2025 • 158
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding

Paper • 2506.16035 • Published Jun 19, 2025 • 89

Training optimization

The Curse of Depth in Large Language Models

Paper • 2502.05795 • Published Feb 9, 2025 • 40
Transformers without Normalization

Paper • 2503.10622 • Published Mar 13, 2025 • 172
Parallel Scaling Law for Language Models

Paper • 2505.10475 • Published May 15, 2025 • 83
Learning to Skip the Middle Layers of Transformers

Paper • 2506.21103 • Published Jun 26, 2025 • 18

Previous
1
2
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs