HunyuanOCR GGUF (Experimental)

This repository contains GGUF quantized weights for HunyuanOCR, an expert end-to-end OCR VLM powered by Hunyuan's native multimodal architecture.

Quantized and verified by bombman.

Highlights

Specialization: Expert in document parsing, receipts, and multilingual OCR.
Quantization: Q8_0 (High precision, reduced size).
Inference Speed: ~160-180 tokens/sec on NVIDIA RTX 4060 Ti (16GB).
Accuracy: Successfully tested with Thai/English restaurant receipts with near-perfect structure extraction.

Files

HunyuanOCR-Q8_0.gguf: Quantized LLM (The "Brain").
HunyuanOCR-mmproj-f16.gguf: Multimodal projector (The "Eyes"). Note: Must be used together with the LLM.

Quick Start (llama.cpp)

To run this model on Linux/Windows via llama-cli, use the following command:

./llama-cli \
    -m HunyuanOCR-Q8_0.gguf \
    --mmproj HunyuanOCR-mmproj-f16.gguf \
    --image your_receipt.jpg \
    -p "<｜hy_begin▁of▁sentence｜>Please perform a full OCR on this image and extract all text.<｜hy_User｜>" \
    -ngl 99 --temp 0 --repeat-penalty 1.1

Downloads last month: 149

GGUF

Model size

0.5B params

Architecture

hunyuan-dense

Hardware compatibility

4-bit

8-bit

16-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for bombman/HunyuanOCR-GGUF

Unable to build the model tree, the base model loops to the model itself. Learn more.