Image-to-Text
Transformers
ONNX
florence2
image-text-to-text
florence-2
vision-language
image-captioning
ocr
object-detection
int8
quantized
Instructions to use Heliosoph/florence-2-base-ft-quantized-onnx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Heliosoph/florence-2-base-ft-quantized-onnx with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "image-to-text" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("image-to-text", model="Heliosoph/florence-2-base-ft-quantized-onnx")# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("Heliosoph/florence-2-base-ft-quantized-onnx") model = AutoModelForImageTextToText.from_pretrained("Heliosoph/florence-2-base-ft-quantized-onnx") - Notebooks
- Google Colab
- Kaggle
File size: 2,073 Bytes
6c13d00 d818ad0 6c13d00 d818ad0 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 | ---
license: mit
library_name: transformers
tags:
- florence-2
- vision-language
- image-captioning
- ocr
- object-detection
- onnx
- int8
- quantized
base_model: microsoft/Florence-2-base-ft
pipeline_tag: image-to-text
---
# Florence-2 base-ft — ONNX (INT8 dynamic-quantized)
ONNX export of [microsoft/Florence-2-base-ft](https://huggingface.co/microsoft/Florence-2-base-ft) with post-export INT8 dynamic quantization applied to all four sub-models. Roughly half the disk and inference RAM of the fp16 variant.
Converted artifact. Training credit: Microsoft Research.
## What this repo contains
```
config.json
generation_config.json
preprocessor_config.json
tokenizer.json
tokenizer_config.json
vocab.json
merges.txt
special_tokens_map.json
vision_encoder_quantized.onnx
encoder_model_quantized.onnx
decoder_model_quantized.onnx
embed_tokens_quantized.onnx
```
Total: ~270 MB. All four ONNX files are required at inference.
## How it was produced
1. Export to fp32 ONNX:
```
optimum-cli export onnx \
--model microsoft/Florence-2-base-ft \
--task image-to-text \
--trust-remote-code \
<fp32-output>
```
2. Apply dynamic INT8 quantization to each `.onnx` file using `onnxruntime.quantization.quantize_dynamic` (weight-only, per-channel).
Toolchain: `optimum 1.24.0`, `transformers 4.45.2`, `onnxruntime 1.19.x`.
## When to pick quantized vs fp16
This repo (**INT8**): CPU, NPU (OpenVINO EP), mobile, browser (transformers.js). ~270 MB.
[`Heliosoph/florence-2-base-ft-fp16-onnx`](https://huggingface.co/Heliosoph/florence-2-base-ft-fp16-onnx): GPU, maximum quality. ~520 MB.
**Known degradation:** Dense OCR over small text loses noticeable accuracy at INT8. Captioning and object detection are largely unaffected. Test on your workload before committing.
## Task prompts
Identical to the fp16 variant — see [`Heliosoph/florence-2-base-ft-fp16-onnx`](https://huggingface.co/Heliosoph/florence-2-base-ft-fp16-onnx) for the full list.
## License
**MIT** — same as upstream. `LICENSE` file included.
|