HunyuanOCR GGUF (Experimental)
This repository contains GGUF quantized weights for HunyuanOCR, an expert end-to-end OCR VLM powered by Hunyuan's native multimodal architecture.
Quantized and verified by bombman.
Highlights
- Specialization: Expert in document parsing, receipts, and multilingual OCR.
- Quantization: Q8_0 (High precision, reduced size).
- Inference Speed: ~160-180 tokens/sec on NVIDIA RTX 4060 Ti (16GB).
- Accuracy: Successfully tested with Thai/English restaurant receipts with near-perfect structure extraction.
Files
HunyuanOCR-Q8_0.gguf: Quantized LLM (The "Brain").HunyuanOCR-mmproj-f16.gguf: Multimodal projector (The "Eyes"). Note: Must be used together with the LLM.
Quick Start (llama.cpp)
To run this model on Linux/Windows via llama-cli, use the following command:
./llama-cli \
-m HunyuanOCR-Q8_0.gguf \
--mmproj HunyuanOCR-mmproj-f16.gguf \
--image your_receipt.jpg \
-p "<|hy_begin▁of▁sentence|>Please perform a full OCR on this image and extract all text.<|hy_User|>" \
-ngl 99 --temp 0 --repeat-penalty 1.1
- Downloads last month
- 149
Hardware compatibility
Log In to add your hardware
4-bit
8-bit
16-bit
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for bombman/HunyuanOCR-GGUF
Unable to build the model tree, the base model loops to the model itself. Learn more.