VLM + OCR - a Bk9x Collection

Bk9x 's Collections

Data_Pretrain_NLP

Automatic Speech Recognition

VLM + OCR

updated 29 days ago

5CD-AI/Vintern-1B-v2

Image-Text-to-Text • 0.9B • Updated Jan 17, 2025 • 357 • 81
erax-ai/EraX-VL-7B-V1.0

Image-Text-to-Text • 8B • Updated Jan 15, 2025 • 414 • 43
Running on Zero

Agents

Featured

271

granite-docling-258M demo

📝

271

Extract and convert document content from images
datalab-to/chandra

Image-Text-to-Text • 9B • Updated 22 days ago • 115k • 516
deepseek-ai/DeepSeek-OCR

Image-Text-to-Text • 3B • Updated Nov 4, 2025 • 2.02M • 3.21k
Running on Zero

MCP

68

Multimodal OCR3

🌖

68

Chandra-OCR / Nanonets-OCR2 / olmOCR-2 / Dots.OCR
lightonai/LightOnOCR-2-1B

Image-Text-to-Text • 1B • Updated 10 days ago • 733k • 657
HuggingFaceFW/finepdfs

Viewer • Updated 14 days ago • 476M • 22k • 848
baidu/Qianfan-OCR

Image-Text-to-Text • 5B • Updated 22 days ago • 148k • 1.16k

Note 4B direct image-to-Markdown conversion and supports a broad range of prompt-driven tasks — from structured document parsing and table extraction to chart understanding, document question answering, and key information extraction