flyingbertman commited on
Commit
d818ad0
·
verified ·
1 Parent(s): 6c13d00

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +67 -0
README.md CHANGED
@@ -1,3 +1,70 @@
1
  ---
2
  license: mit
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ library_name: transformers
4
+ tags:
5
+ - florence-2
6
+ - vision-language
7
+ - image-captioning
8
+ - ocr
9
+ - object-detection
10
+ - onnx
11
+ - int8
12
+ - quantized
13
+ base_model: microsoft/Florence-2-base-ft
14
+ pipeline_tag: image-to-text
15
  ---
16
+
17
+ # Florence-2 base-ft — ONNX (INT8 dynamic-quantized)
18
+
19
+ ONNX export of [microsoft/Florence-2-base-ft](https://huggingface.co/microsoft/Florence-2-base-ft) with post-export INT8 dynamic quantization applied to all four sub-models. Roughly half the disk and inference RAM of the fp16 variant.
20
+
21
+ Converted artifact. Training credit: Microsoft Research.
22
+
23
+ ## What this repo contains
24
+
25
+ ```
26
+ config.json
27
+ generation_config.json
28
+ preprocessor_config.json
29
+ tokenizer.json
30
+ tokenizer_config.json
31
+ vocab.json
32
+ merges.txt
33
+ special_tokens_map.json
34
+
35
+ vision_encoder_quantized.onnx
36
+ encoder_model_quantized.onnx
37
+ decoder_model_quantized.onnx
38
+ embed_tokens_quantized.onnx
39
+ ```
40
+
41
+ Total: ~270 MB. All four ONNX files are required at inference.
42
+
43
+ ## How it was produced
44
+
45
+ 1. Export to fp32 ONNX:
46
+ ```
47
+ optimum-cli export onnx \
48
+ --model microsoft/Florence-2-base-ft \
49
+ --task image-to-text \
50
+ --trust-remote-code \
51
+ <fp32-output>
52
+ ```
53
+ 2. Apply dynamic INT8 quantization to each `.onnx` file using `onnxruntime.quantization.quantize_dynamic` (weight-only, per-channel).
54
+
55
+ Toolchain: `optimum 1.24.0`, `transformers 4.45.2`, `onnxruntime 1.19.x`.
56
+
57
+ ## When to pick quantized vs fp16
58
+
59
+ This repo (**INT8**): CPU, NPU (OpenVINO EP), mobile, browser (transformers.js). ~270 MB.
60
+ [`Heliosoph/florence-2-base-ft-fp16-onnx`](https://huggingface.co/Heliosoph/florence-2-base-ft-fp16-onnx): GPU, maximum quality. ~520 MB.
61
+
62
+ **Known degradation:** Dense OCR over small text loses noticeable accuracy at INT8. Captioning and object detection are largely unaffected. Test on your workload before committing.
63
+
64
+ ## Task prompts
65
+
66
+ Identical to the fp16 variant — see [`Heliosoph/florence-2-base-ft-fp16-onnx`](https://huggingface.co/Heliosoph/florence-2-base-ft-fp16-onnx) for the full list.
67
+
68
+ ## License
69
+
70
+ **MIT** — same as upstream. `LICENSE` file included.