File size: 2,073 Bytes
6c13d00
 
d818ad0
 
 
 
 
 
 
 
 
 
 
 
6c13d00
d818ad0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
---
license: mit
library_name: transformers
tags:
  - florence-2
  - vision-language
  - image-captioning
  - ocr
  - object-detection
  - onnx
  - int8
  - quantized
base_model: microsoft/Florence-2-base-ft
pipeline_tag: image-to-text
---

# Florence-2 base-ft — ONNX (INT8 dynamic-quantized)

ONNX export of [microsoft/Florence-2-base-ft](https://huggingface.co/microsoft/Florence-2-base-ft) with post-export INT8 dynamic quantization applied to all four sub-models. Roughly half the disk and inference RAM of the fp16 variant.

Converted artifact. Training credit: Microsoft Research.

## What this repo contains

```
config.json
generation_config.json
preprocessor_config.json
tokenizer.json
tokenizer_config.json
vocab.json
merges.txt
special_tokens_map.json

vision_encoder_quantized.onnx
encoder_model_quantized.onnx
decoder_model_quantized.onnx
embed_tokens_quantized.onnx
```

Total: ~270 MB. All four ONNX files are required at inference.

## How it was produced

1. Export to fp32 ONNX:
   ```
   optimum-cli export onnx \
       --model microsoft/Florence-2-base-ft \
       --task image-to-text \
       --trust-remote-code \
       <fp32-output>
   ```
2. Apply dynamic INT8 quantization to each `.onnx` file using `onnxruntime.quantization.quantize_dynamic` (weight-only, per-channel).

Toolchain: `optimum 1.24.0`, `transformers 4.45.2`, `onnxruntime 1.19.x`.

## When to pick quantized vs fp16

This repo (**INT8**): CPU, NPU (OpenVINO EP), mobile, browser (transformers.js). ~270 MB.
[`Heliosoph/florence-2-base-ft-fp16-onnx`](https://huggingface.co/Heliosoph/florence-2-base-ft-fp16-onnx): GPU, maximum quality. ~520 MB.

**Known degradation:** Dense OCR over small text loses noticeable accuracy at INT8. Captioning and object detection are largely unaffected. Test on your workload before committing.

## Task prompts

Identical to the fp16 variant — see [`Heliosoph/florence-2-base-ft-fp16-onnx`](https://huggingface.co/Heliosoph/florence-2-base-ft-fp16-onnx) for the full list.

## License

**MIT** — same as upstream. `LICENSE` file included.