Image-to-Text
Transformers
ONNX
florence2
image-text-to-text
florence-2
vision-language
image-captioning
ocr
object-detection
int8
quantized
Instructions to use Heliosoph/florence-2-base-ft-quantized-onnx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Heliosoph/florence-2-base-ft-quantized-onnx with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "image-to-text" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("image-to-text", model="Heliosoph/florence-2-base-ft-quantized-onnx")# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("Heliosoph/florence-2-base-ft-quantized-onnx") model = AutoModelForImageTextToText.from_pretrained("Heliosoph/florence-2-base-ft-quantized-onnx") - Notebooks
- Google Colab
- Kaggle
| license: mit | |
| library_name: transformers | |
| tags: | |
| - florence-2 | |
| - vision-language | |
| - image-captioning | |
| - ocr | |
| - object-detection | |
| - onnx | |
| - int8 | |
| - quantized | |
| base_model: microsoft/Florence-2-base-ft | |
| pipeline_tag: image-to-text | |
| # Florence-2 base-ft — ONNX (INT8 dynamic-quantized) | |
| ONNX export of [microsoft/Florence-2-base-ft](https://huggingface.co/microsoft/Florence-2-base-ft) with post-export INT8 dynamic quantization applied to all four sub-models. Roughly half the disk and inference RAM of the fp16 variant. | |
| Converted artifact. Training credit: Microsoft Research. | |
| ## What this repo contains | |
| ``` | |
| config.json | |
| generation_config.json | |
| preprocessor_config.json | |
| tokenizer.json | |
| tokenizer_config.json | |
| vocab.json | |
| merges.txt | |
| special_tokens_map.json | |
| vision_encoder_quantized.onnx | |
| encoder_model_quantized.onnx | |
| decoder_model_quantized.onnx | |
| embed_tokens_quantized.onnx | |
| ``` | |
| Total: ~270 MB. All four ONNX files are required at inference. | |
| ## How it was produced | |
| 1. Export to fp32 ONNX: | |
| ``` | |
| optimum-cli export onnx \ | |
| --model microsoft/Florence-2-base-ft \ | |
| --task image-to-text \ | |
| --trust-remote-code \ | |
| <fp32-output> | |
| ``` | |
| 2. Apply dynamic INT8 quantization to each `.onnx` file using `onnxruntime.quantization.quantize_dynamic` (weight-only, per-channel). | |
| Toolchain: `optimum 1.24.0`, `transformers 4.45.2`, `onnxruntime 1.19.x`. | |
| ## When to pick quantized vs fp16 | |
| This repo (**INT8**): CPU, NPU (OpenVINO EP), mobile, browser (transformers.js). ~270 MB. | |
| [`Heliosoph/florence-2-base-ft-fp16-onnx`](https://huggingface.co/Heliosoph/florence-2-base-ft-fp16-onnx): GPU, maximum quality. ~520 MB. | |
| **Known degradation:** Dense OCR over small text loses noticeable accuracy at INT8. Captioning and object detection are largely unaffected. Test on your workload before committing. | |
| ## Task prompts | |
| Identical to the fp16 variant — see [`Heliosoph/florence-2-base-ft-fp16-onnx`](https://huggingface.co/Heliosoph/florence-2-base-ft-fp16-onnx) for the full list. | |
| ## License | |
| **MIT** — same as upstream. `LICENSE` file included. | |