Image-to-Text
Transformers
ONNX
florence2
image-text-to-text
florence-2
vision-language
image-captioning
ocr
object-detection
int8
quantized
Instructions to use Heliosoph/florence-2-base-ft-quantized-onnx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Heliosoph/florence-2-base-ft-quantized-onnx with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "image-to-text" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("image-to-text", model="Heliosoph/florence-2-base-ft-quantized-onnx")# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("Heliosoph/florence-2-base-ft-quantized-onnx") model = AutoModelForImageTextToText.from_pretrained("Heliosoph/florence-2-base-ft-quantized-onnx") - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,70 @@
|
|
| 1 |
---
|
| 2 |
license: mit
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: mit
|
| 3 |
+
library_name: transformers
|
| 4 |
+
tags:
|
| 5 |
+
- florence-2
|
| 6 |
+
- vision-language
|
| 7 |
+
- image-captioning
|
| 8 |
+
- ocr
|
| 9 |
+
- object-detection
|
| 10 |
+
- onnx
|
| 11 |
+
- int8
|
| 12 |
+
- quantized
|
| 13 |
+
base_model: microsoft/Florence-2-base-ft
|
| 14 |
+
pipeline_tag: image-to-text
|
| 15 |
---
|
| 16 |
+
|
| 17 |
+
# Florence-2 base-ft — ONNX (INT8 dynamic-quantized)
|
| 18 |
+
|
| 19 |
+
ONNX export of [microsoft/Florence-2-base-ft](https://huggingface.co/microsoft/Florence-2-base-ft) with post-export INT8 dynamic quantization applied to all four sub-models. Roughly half the disk and inference RAM of the fp16 variant.
|
| 20 |
+
|
| 21 |
+
Converted artifact. Training credit: Microsoft Research.
|
| 22 |
+
|
| 23 |
+
## What this repo contains
|
| 24 |
+
|
| 25 |
+
```
|
| 26 |
+
config.json
|
| 27 |
+
generation_config.json
|
| 28 |
+
preprocessor_config.json
|
| 29 |
+
tokenizer.json
|
| 30 |
+
tokenizer_config.json
|
| 31 |
+
vocab.json
|
| 32 |
+
merges.txt
|
| 33 |
+
special_tokens_map.json
|
| 34 |
+
|
| 35 |
+
vision_encoder_quantized.onnx
|
| 36 |
+
encoder_model_quantized.onnx
|
| 37 |
+
decoder_model_quantized.onnx
|
| 38 |
+
embed_tokens_quantized.onnx
|
| 39 |
+
```
|
| 40 |
+
|
| 41 |
+
Total: ~270 MB. All four ONNX files are required at inference.
|
| 42 |
+
|
| 43 |
+
## How it was produced
|
| 44 |
+
|
| 45 |
+
1. Export to fp32 ONNX:
|
| 46 |
+
```
|
| 47 |
+
optimum-cli export onnx \
|
| 48 |
+
--model microsoft/Florence-2-base-ft \
|
| 49 |
+
--task image-to-text \
|
| 50 |
+
--trust-remote-code \
|
| 51 |
+
<fp32-output>
|
| 52 |
+
```
|
| 53 |
+
2. Apply dynamic INT8 quantization to each `.onnx` file using `onnxruntime.quantization.quantize_dynamic` (weight-only, per-channel).
|
| 54 |
+
|
| 55 |
+
Toolchain: `optimum 1.24.0`, `transformers 4.45.2`, `onnxruntime 1.19.x`.
|
| 56 |
+
|
| 57 |
+
## When to pick quantized vs fp16
|
| 58 |
+
|
| 59 |
+
This repo (**INT8**): CPU, NPU (OpenVINO EP), mobile, browser (transformers.js). ~270 MB.
|
| 60 |
+
[`Heliosoph/florence-2-base-ft-fp16-onnx`](https://huggingface.co/Heliosoph/florence-2-base-ft-fp16-onnx): GPU, maximum quality. ~520 MB.
|
| 61 |
+
|
| 62 |
+
**Known degradation:** Dense OCR over small text loses noticeable accuracy at INT8. Captioning and object detection are largely unaffected. Test on your workload before committing.
|
| 63 |
+
|
| 64 |
+
## Task prompts
|
| 65 |
+
|
| 66 |
+
Identical to the fp16 variant — see [`Heliosoph/florence-2-base-ft-fp16-onnx`](https://huggingface.co/Heliosoph/florence-2-base-ft-fp16-onnx) for the full list.
|
| 67 |
+
|
| 68 |
+
## License
|
| 69 |
+
|
| 70 |
+
**MIT** — same as upstream. `LICENSE` file included.
|