Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2601.21639

Towards Pixel-Level VLM Perception via Simple Points Prediction

Paper • 2601.19228 • Published Jan 27 • 18
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep

Paper • 2601.19895 • Published Jan 27 • 27
Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision

Paper • 2601.19798 • Published Jan 27 • 43
OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models

Paper • 2601.21639 • Published Jan 29 • 51

PubTables-1M: Towards comprehensive table extraction from unstructured documents

Paper • 2110.00061 • Published Sep 30, 2021 • 3
Optimized Table Tokenization for Table Structure Recognition

Paper • 2305.03393 • Published May 5, 2023 • 1
Qwen3-VL Technical Report

Paper • 2511.21631 • Published Nov 26, 2025 • 161
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model

Paper • 2510.14528 • Published Oct 16, 2025 • 124

AI Paper of the Day

A collection of papers that I think are interesting, one added each day

Can Large Language Models Understand Context?

Paper • 2402.00858 • Published Feb 1, 2024 • 24
OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1, 2024 • 85
Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 153
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity

Paper • 2401.17072 • Published Jan 30, 2024 • 25

C-RADIOv4 (Tech Report)

Paper • 2601.17237 • Published Jan 24 • 10
OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models

Paper • 2601.21639 • Published Jan 29 • 51

MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing

Paper • 2509.22186 • Published Sep 26, 2025 • 159
CommonForms: A Large, Diverse Dataset for Form Field Detection

Paper • 2509.16506 • Published Sep 20, 2025 • 22
Automated Structured Radiology Report Generation with Rich Clinical Context

Paper • 2510.00428 • Published Oct 1, 2025 • 8
Extract-0: A Specialized Language Model for Document Information Extraction

Paper • 2509.22906 • Published Sep 26, 2025

DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 191
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training

Paper • 2401.00849 • Published Jan 1, 2024 • 17
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 51
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing

Paper • 2311.00571 • Published Nov 1, 2023 • 42

Towards Pixel-Level VLM Perception via Simple Points Prediction

Paper • 2601.19228 • Published Jan 27 • 18
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep

Paper • 2601.19895 • Published Jan 27 • 27
Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision

Paper • 2601.19798 • Published Jan 27 • 43
OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models

Paper • 2601.21639 • Published Jan 29 • 51

C-RADIOv4 (Tech Report)

Paper • 2601.17237 • Published Jan 24 • 10
OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models

Paper • 2601.21639 • Published Jan 29 • 51

PubTables-1M: Towards comprehensive table extraction from unstructured documents

Paper • 2110.00061 • Published Sep 30, 2021 • 3
Optimized Table Tokenization for Table Structure Recognition

Paper • 2305.03393 • Published May 5, 2023 • 1
Qwen3-VL Technical Report

Paper • 2511.21631 • Published Nov 26, 2025 • 161
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model

Paper • 2510.14528 • Published Oct 16, 2025 • 124

MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing

Paper • 2509.22186 • Published Sep 26, 2025 • 159
CommonForms: A Large, Diverse Dataset for Form Field Detection

Paper • 2509.16506 • Published Sep 20, 2025 • 22
Automated Structured Radiology Report Generation with Rich Clinical Context

Paper • 2510.00428 • Published Oct 1, 2025 • 8
Extract-0: A Specialized Language Model for Document Information Extraction

Paper • 2509.22906 • Published Sep 26, 2025

AI Paper of the Day

A collection of papers that I think are interesting, one added each day

Can Large Language Models Understand Context?

Paper • 2402.00858 • Published Feb 1, 2024 • 24
OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1, 2024 • 85
Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 153
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity

Paper • 2401.17072 • Published Jan 30, 2024 • 25

DocLLM: A layout-aware generative language model for multimodal document understanding

Paper • 2401.00908 • Published Dec 31, 2023 • 191
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training

Paper • 2401.00849 • Published Jan 1, 2024 • 17
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 51
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing

Paper • 2311.00571 • Published Nov 1, 2023 • 42

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs