Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2412.17847

Research on AI Ecosystem Data

Research papers leveraging AI ecosystem data

The ATOM Report: Measuring the Open Language Model Ecosystem

Paper • 2604.07190 • Published 12 days ago • 5
Economies of Open Intelligence: Tracing Power & Participation in the Model Ecosystem

Paper • 2512.03073 • Published Nov 27, 2025 • 7
Anatomy of a Machine Learning Ecosystem: 2 Million Models on Hugging Face

Paper • 2508.06811 • Published Aug 9, 2025 • 6
Bridging the Data Provenance Gap Across Text, Speech and Video

Paper • 2412.17847 • Published Dec 19, 2024 • 12

Data and other things

MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval

Paper • 2412.14475 • Published Dec 19, 2024 • 58
How to Synthesize Text Data without Model Collapse?

Paper • 2412.14689 • Published Dec 19, 2024 • 53
Token-Budget-Aware LLM Reasoning

Paper • 2412.18547 • Published Dec 24, 2024 • 46
WavePulse: Real-time Content Analytics of Radio Livestreams

Paper • 2412.17998 • Published Dec 23, 2024 • 11

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 30
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 15
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

interest_need_read

感兴趣热门论文集合

ProcessBench: Identifying Process Errors in Mathematical Reasoning

Paper • 2412.06559 • Published Dec 9, 2024 • 86
Maya: An Instruction Finetuned Multilingual Multimodal Model

Paper • 2412.07112 • Published Dec 10, 2024 • 28
OpenAI o1 System Card

Paper • 2412.16720 • Published Dec 21, 2024 • 37
Diving into Self-Evolving Training for Multimodal Reasoning

Paper • 2412.17451 • Published Dec 23, 2024 • 42

Data Provenance

Bridging the Data Provenance Gap Across Text, Speech and Video

Paper • 2412.17847 • Published Dec 19, 2024 • 12
Consent in Crisis: The Rapid Decline of the AI Data Commons

Paper • 2407.14933 • Published Jul 20, 2024 • 15
Data Authenticity, Consent, & Provenance for AI are all broken: what will it take to fix them?

Paper • 2404.12691 • Published Apr 19, 2024 • 2
The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI

Paper • 2310.16787 • Published Oct 25, 2023 • 5

Research on AI Ecosystem Data

Research papers leveraging AI ecosystem data

The ATOM Report: Measuring the Open Language Model Ecosystem

Paper • 2604.07190 • Published 12 days ago • 5
Economies of Open Intelligence: Tracing Power & Participation in the Model Ecosystem

Paper • 2512.03073 • Published Nov 27, 2025 • 7
Anatomy of a Machine Learning Ecosystem: 2 Million Models on Hugging Face

Paper • 2508.06811 • Published Aug 9, 2025 • 6
Bridging the Data Provenance Gap Across Text, Speech and Video

Paper • 2412.17847 • Published Dec 19, 2024 • 12

interest_need_read

感兴趣热门论文集合

ProcessBench: Identifying Process Errors in Mathematical Reasoning

Paper • 2412.06559 • Published Dec 9, 2024 • 86
Maya: An Instruction Finetuned Multilingual Multimodal Model

Paper • 2412.07112 • Published Dec 10, 2024 • 28
OpenAI o1 System Card

Paper • 2412.16720 • Published Dec 21, 2024 • 37
Diving into Self-Evolving Training for Multimodal Reasoning

Paper • 2412.17451 • Published Dec 23, 2024 • 42

Data and other things

MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval

Paper • 2412.14475 • Published Dec 19, 2024 • 58
How to Synthesize Text Data without Model Collapse?

Paper • 2412.14689 • Published Dec 19, 2024 • 53
Token-Budget-Aware LLM Reasoning

Paper • 2412.18547 • Published Dec 24, 2024 • 46
WavePulse: Real-time Content Analytics of Radio Livestreams

Paper • 2412.17998 • Published Dec 23, 2024 • 11

Data Provenance

Bridging the Data Provenance Gap Across Text, Speech and Video

Paper • 2412.17847 • Published Dec 19, 2024 • 12
Consent in Crisis: The Rapid Decline of the AI Data Commons

Paper • 2407.14933 • Published Jul 20, 2024 • 15
Data Authenticity, Consent, & Provenance for AI are all broken: what will it take to fix them?

Paper • 2404.12691 • Published Apr 19, 2024 • 2
The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI

Paper • 2310.16787 • Published Oct 25, 2023 • 5

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 30
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 15
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs