HunyuanOCR GGUF (Experimental)

This repository contains GGUF quantized weights for HunyuanOCR, an expert end-to-end OCR VLM powered by Hunyuan's native multimodal architecture.

Quantized and verified by bombman.

Highlights

  • Specialization: Expert in document parsing, receipts, and multilingual OCR.
  • Quantization: Q8_0 (High precision, reduced size).
  • Inference Speed: ~160-180 tokens/sec on NVIDIA RTX 4060 Ti (16GB).
  • Accuracy: Successfully tested with Thai/English restaurant receipts with near-perfect structure extraction.

Files

  • HunyuanOCR-Q8_0.gguf: Quantized LLM (The "Brain").
  • HunyuanOCR-mmproj-f16.gguf: Multimodal projector (The "Eyes"). Note: Must be used together with the LLM.

Quick Start (llama.cpp)

To run this model on Linux/Windows via llama-cli, use the following command:

./llama-cli \
    -m HunyuanOCR-Q8_0.gguf \
    --mmproj HunyuanOCR-mmproj-f16.gguf \
    --image your_receipt.jpg \
    -p "<|hy_begin▁of▁sentence|>Please perform a full OCR on this image and extract all text.<|hy_User|>" \
    -ngl 99 --temp 0 --repeat-penalty 1.1
Downloads last month
149
GGUF
Model size
0.5B params
Architecture
hunyuan-dense
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for bombman/HunyuanOCR-GGUF

Unable to build the model tree, the base model loops to the model itself. Learn more.