shreyask commited on
Commit
291d326
·
verified ·
1 Parent(s): 60d9ed2

Add ONNX models for WebGPU inference (INT4 quantized)

Browse files
.gitattributes CHANGED
@@ -33,3 +33,12 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ coord_decoder.onnx.data filter=lfs diff=lfs merge=lfs -text
37
+ coord_encoder.onnx.data filter=lfs diff=lfs merge=lfs -text
38
+ decoder_step.onnx.data filter=lfs diff=lfs merge=lfs -text
39
+ embed_tokens.onnx.data filter=lfs diff=lfs merge=lfs -text
40
+ encoder.onnx.data filter=lfs diff=lfs merge=lfs -text
41
+ img_projector.onnx.data filter=lfs diff=lfs merge=lfs -text
42
+ segm_head.onnx.data filter=lfs diff=lfs merge=lfs -text
43
+ size_decoder.onnx.data filter=lfs diff=lfs merge=lfs -text
44
+ size_encoder.onnx.data filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,93 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: onnxruntime
3
+ tags:
4
+ - onnx
5
+ - webgpu
6
+ - vision
7
+ - object-detection
8
+ - segmentation
9
+ - falcon-perception
10
+ base_model: tiiuae/falcon-perception
11
+ license: apache-2.0
12
+ ---
13
+
14
+ # Falcon Perception — ONNX (WebGPU)
15
+
16
+ ONNX export of [tiiuae/falcon-perception](https://huggingface.co/tiiuae/falcon-perception) (0.6B parameters) for in-browser inference via WebGPU.
17
+
18
+ The encoder and decoder weights are **INT4 quantized** (MatMulNBits, block_size=128) for efficient browser delivery.
19
+
20
+ ## Model Files
21
+
22
+ | File | Description | Size |
23
+ |------|-------------|------|
24
+ | `encoder.onnx` | 28-layer transformer backbone (INT4 quantized) | 357 MB |
25
+ | `decoder_step.onnx` | Single autoregressive decode step with KV cache (INT4 quantized) | 261 MB |
26
+ | `embed_tokens.onnx` | Token embedding lookup | 256 MB |
27
+ | `segm_head.onnx` | Segmentation mask projection | 9 MB |
28
+ | `coord_decoder.onnx` | Coordinate prediction head | 96 MB |
29
+ | `size_decoder.onnx` | Size prediction head | 96 MB |
30
+ | `coord_encoder.onnx` | Coordinate Fourier encoding | 2 MB |
31
+ | `size_encoder.onnx` | Size Fourier encoding | 2 MB |
32
+ | `img_projector.onnx` | Image patch projection | 3 MB |
33
+
34
+ **Total download: ~1.1 GB** (vs 2.5 GB fp32 original)
35
+
36
+ ## Architecture
37
+
38
+ Falcon Perception is an early-fusion vision-language model that performs open-vocabulary object detection and segmentation. It processes image patches and text tokens in a unified transformer with hybrid attention masking (bidirectional for images, causal for text).
39
+
40
+ The model outputs a structured chain-of-perception sequence per detected object:
41
+ ```
42
+ <coord> → <size> → <seg>
43
+ ```
44
+
45
+ ## Inference Pipeline
46
+
47
+ ```
48
+ 1. Tokenize text query → token IDs
49
+ 2. Process image → pixel patches
50
+ 3. embed_tokens(token_ids) → token embeddings
51
+ 4. img_projector(pixel_patches) → image features
52
+ 5. Scatter image features into token sequence
53
+ 6. encoder(embeddings, freqs, mask) → logits + hidden states
54
+ 7. Autoregressive decode loop:
55
+ a. Sample next token from logits
56
+ b. decoder_step(token, kv_cache, ...) → next logits
57
+ c. If <coord> token: coord_decoder(hidden) → xy coordinates
58
+ d. If <size> token: size_decoder(hidden) → hw sizes
59
+ e. If <seg> token: segm_head(hidden, hr_features) → binary mask
60
+ ```
61
+
62
+ ## Conversion Details
63
+
64
+ - **Source model**: [tiiuae/falcon-perception](https://huggingface.co/tiiuae/falcon-perception) (Apache 2.0)
65
+ - **Quantization**: INT4 weight-only (MatMulNBits, asymmetric, block_size=128)
66
+ - **ONNX opset**: 18
67
+ - **Modifications for ONNX compatibility**:
68
+ - FlexAttention → F.scaled_dot_product_attention with dense bool mask
69
+ - Triton squared_relu_gate kernel → pure PyTorch: `relu(gate).pow(2) * up`
70
+ - Complex-valued RoPE → real cos/sin rotation
71
+ - masked_scatter/masked_select → torch.where + index gather
72
+ - AnyUp FlexCrossAttention → SDPA with precomputed window mask
73
+
74
+ ## Usage with ONNX Runtime Web (WebGPU)
75
+
76
+ ```javascript
77
+ import { InferenceSession } from 'onnxruntime-web/webgpu';
78
+
79
+ const encoder = await InferenceSession.create('./encoder.onnx', {
80
+ executionProviders: ['webgpu'],
81
+ });
82
+ ```
83
+
84
+ ## Citation
85
+
86
+ ```bibtex
87
+ @article{falcon-perception,
88
+ title={Falcon Perception},
89
+ author={TII},
90
+ year={2025},
91
+ url={https://huggingface.co/tiiuae/falcon-perception}
92
+ }
93
+ ```
config.json ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "dim": 1024,
3
+ "n_layers": 28,
4
+ "n_heads": 16,
5
+ "head_dim": 128,
6
+ "n_kv_heads": 8,
7
+ "vocab_size": 65536,
8
+ "max_seq_len": 8192,
9
+ "segm_out_dim": 256,
10
+ "coord_out_dim": 2048,
11
+ "size_out_dim": 2048,
12
+ "coord_token_id": 240,
13
+ "size_token_id": 241,
14
+ "seg_token_id": 262,
15
+ "eos_id": 11,
16
+ "img_id": 227,
17
+ "image_cls_token_id": 244,
18
+ "img_end_id": 230,
19
+ "spatial_patch_size": 16,
20
+ "temporal_patch_size": 1,
21
+ "channel_size": 3
22
+ }
coord_decoder.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7f0a36046c2563c63f3ce464cf4da56dd79ec61516bef648c4f874acbc49f611
3
+ size 4638
coord_decoder.onnx.data ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:07362b54cbb7090e30dc1111e9925e3e1a4eeece425223b8c3ea58d413d681c9
3
+ size 100663296
coord_encoder.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b15410de867662baaddb21bafaefcea1937ce5674287a4225a11aa58e51229f8
3
+ size 5976
coord_encoder.onnx.data ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:042846aa31d0228d32ca884f03641c3b4aebd336bd031e68f31356f25d7bd96f
3
+ size 2162688
decoder_step.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:265fe991bb746cbde9dcb7d3cc8d67ea9e2c8d2df3a073bf106d30474e2f39ae
3
+ size 1648113
decoder_step.onnx.data ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d989ca541bde9f5e70b4bd7aca306c03a3ddd135bb244b30419c01ae363482a8
3
+ size 271601664
embed_tokens.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:593507aecd0df988dd097dcde68f06b16be35399498bf4bd480886bf20cef36f
3
+ size 2023
embed_tokens.onnx.data ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8c7d0bd8187672104870bf5df069a54b99224ccf2529a294fe931c2674ba3b2b
3
+ size 268435456
encoder.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:11b57580e10663ac8b4df623813dabee62d3b29c0d8434d95b5f61b452a38434
3
+ size 3254084
encoder.onnx.data ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0b4e02b005e3433ace446773995d384a6413886b7fbe4f383b1f5e2af9515564
3
+ size 370929664
img_projector.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0ca7ed8981b49e317c6841f01029c60940a80e8e4c8beb2e7f95bf306e856569
3
+ size 1943
img_projector.onnx.data ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6b16e7c7f6cb80bb9094b2f3330508ee312772d8b9b0a699e614d9064bb9b7dd
3
+ size 3145728
segm_head.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:acaf34cdfb5f1a0e4d554acf48db4fc9311095358c32034abfae965a954c3903
3
+ size 2226
segm_head.onnx.data ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f624d0964e6e8735872aad32a83c411cca783b5cb8a4ce1b46d33f2dfd55fd86
3
+ size 9502720
size_decoder.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c140235c9884c354194cf6127478f7cfe5b30bf94287e455745fe912e401bb4e
3
+ size 4636
size_decoder.onnx.data ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:928cecb7ed10d6254f2d98bb3d694a50c256a674b94627d1757700825e57b4c8
3
+ size 100663296
size_encoder.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f80fcfe963917733d4c11152cb5b0442578d7e64f0c8e6ade86a18fb8ac7e43b
3
+ size 5965
size_encoder.onnx.data ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7cd9dac526ebc6be6c38b5992256ea6fa70d7584ee6763bb31ea10dd9a0a4164
3
+ size 2162688
special_tokens_map.json ADDED
@@ -0,0 +1,380 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "absence_token": "<|absence|>",
3
+ "additional_special_tokens": [
4
+ "<|pad|>",
5
+ ">>ABSTRACT<<",
6
+ ">>INTRODUCTION<<",
7
+ ">>SUMMARY<<",
8
+ ">>COMMENT<<",
9
+ ">>ANSWER<<",
10
+ ">>QUESTION<<",
11
+ ">>DOMAIN<<",
12
+ ">>PREFIX<<",
13
+ ">>SUFFIX<<",
14
+ ">>MIDDLE<<",
15
+ "<|finetune_right_pad_id|>",
16
+ "<|start_header_id|>",
17
+ "<|end_header_id|>",
18
+ "<|eom_id|>",
19
+ "<|eot_id|>",
20
+ "<|begin_of_text|>",
21
+ ">>TITLE<<",
22
+ "<tool_response>",
23
+ "</tool_response>",
24
+ "<tool_call>",
25
+ "</tool_call>",
26
+ "<schema>",
27
+ "</schema>",
28
+ "<scratch_pad>",
29
+ "</scratch_pad>",
30
+ "<thinking>",
31
+ "</thinking>",
32
+ "<explanation>",
33
+ "</explanation>",
34
+ "<file_sep>",
35
+ "<repo_name>",
36
+ ">>UNUSED_119<<",
37
+ ">>UNUSED_120<<",
38
+ "<|image|>",
39
+ "<|image_row_sep|>",
40
+ "<|start_of_image|>",
41
+ "<|end_of_image|>",
42
+ "<|start_of_video|>",
43
+ "<|end_of_video|>",
44
+ "<|frame_sep|>",
45
+ "<|start_of_turn|>",
46
+ "<|end_of_turn|>",
47
+ "<|start_of_diffusion_query|>",
48
+ "<|end_of_diffusion_query|>",
49
+ "<|diffusion_query|>",
50
+ "<|object|>",
51
+ "<|coord|>",
52
+ "<|size|>",
53
+ "<|perceive|>",
54
+ "<|image_mask_token|>",
55
+ "<|image_cls|>",
56
+ "<|image_reg_1|>",
57
+ "<|image_reg_2|>",
58
+ "<|image_reg_3|>",
59
+ "<|image_reg_4|>",
60
+ "<|image_reg_5|>",
61
+ "<|image_reg_6|>",
62
+ "<|image_reg_7|>",
63
+ "<|image_reg_8|>",
64
+ "<|DET|>",
65
+ "<|POINTING|>",
66
+ "<|OCR_GROUNDING|>",
67
+ "<|OCR_DOC_PARSER|>",
68
+ "<|OCR_PLAIN|>",
69
+ "<|REF_SEG|>",
70
+ "<|POINT_REF_SEG|>",
71
+ "<|CAPTION|>",
72
+ "<|DETAILED_CAPTION|>",
73
+ "<|seg|>",
74
+ "<|end_of_query|>",
75
+ "<|start_of_query|>",
76
+ "<|task_sep|>",
77
+ "<|SEMANTIC_SEG_TASK|>",
78
+ "<|semantic_seg|>",
79
+ "<|presence|>",
80
+ "<|absence|>",
81
+ ">>UNUSED_258<<",
82
+ ">>UNUSED_259<<",
83
+ ">>UNUSED_260<<",
84
+ ">>UNUSED_261<<",
85
+ ">>UNUSED_262<<",
86
+ ">>UNUSED_263<<",
87
+ ">>UNUSED_264<<",
88
+ ">>UNUSED_265<<",
89
+ ">>UNUSED_266<<",
90
+ ">>UNUSED_267<<",
91
+ ">>UNUSED_268<<",
92
+ ">>UNUSED_269<<",
93
+ ">>UNUSED_270<<",
94
+ ">>UNUSED_271<<",
95
+ ">>UNUSED_272<<",
96
+ ">>UNUSED_273<<",
97
+ ">>UNUSED_274<<",
98
+ ">>UNUSED_275<<",
99
+ ">>UNUSED_276<<",
100
+ ">>UNUSED_277<<",
101
+ ">>UNUSED_278<<",
102
+ ">>UNUSED_279<<",
103
+ ">>UNUSED_280<<",
104
+ ">>UNUSED_281<<",
105
+ ">>UNUSED_282<<",
106
+ ">>UNUSED_283<<",
107
+ ">>UNUSED_284<<",
108
+ ">>UNUSED_285<<",
109
+ ">>UNUSED_286<<",
110
+ ">>UNUSED_287<<",
111
+ ">>UNUSED_288<<",
112
+ ">>UNUSED_289<<",
113
+ ">>UNUSED_290<<",
114
+ ">>UNUSED_291<<",
115
+ ">>UNUSED_292<<",
116
+ ">>UNUSED_293<<",
117
+ ">>UNUSED_294<<",
118
+ ">>UNUSED_295<<",
119
+ ">>UNUSED_296<<",
120
+ ">>UNUSED_297<<",
121
+ ">>UNUSED_298<<",
122
+ ">>UNUSED_299<<",
123
+ ">>UNUSED_300<<",
124
+ ">>UNUSED_301<<",
125
+ ">>UNUSED_302<<",
126
+ ">>UNUSED_303<<",
127
+ ">>UNUSED_304<<",
128
+ ">>UNUSED_305<<",
129
+ ">>UNUSED_306<<",
130
+ ">>UNUSED_307<<",
131
+ ">>UNUSED_308<<",
132
+ ">>UNUSED_309<<",
133
+ ">>UNUSED_310<<",
134
+ ">>UNUSED_311<<",
135
+ ">>UNUSED_312<<",
136
+ ">>UNUSED_313<<",
137
+ ">>UNUSED_314<<",
138
+ ">>UNUSED_315<<",
139
+ ">>UNUSED_316<<",
140
+ ">>UNUSED_317<<",
141
+ ">>UNUSED_318<<",
142
+ ">>UNUSED_319<<",
143
+ ">>UNUSED_320<<",
144
+ ">>UNUSED_321<<",
145
+ ">>UNUSED_322<<",
146
+ ">>UNUSED_323<<",
147
+ ">>UNUSED_324<<",
148
+ ">>UNUSED_325<<",
149
+ ">>UNUSED_326<<",
150
+ ">>UNUSED_327<<",
151
+ ">>UNUSED_328<<",
152
+ ">>UNUSED_329<<",
153
+ ">>UNUSED_330<<",
154
+ ">>UNUSED_331<<",
155
+ ">>UNUSED_332<<",
156
+ ">>UNUSED_333<<",
157
+ ">>UNUSED_334<<",
158
+ ">>UNUSED_335<<",
159
+ ">>UNUSED_336<<",
160
+ ">>UNUSED_337<<",
161
+ ">>UNUSED_338<<",
162
+ ">>UNUSED_339<<",
163
+ ">>UNUSED_340<<",
164
+ ">>UNUSED_341<<",
165
+ ">>UNUSED_342<<",
166
+ ">>UNUSED_343<<",
167
+ ">>UNUSED_344<<",
168
+ ">>UNUSED_345<<",
169
+ ">>UNUSED_346<<",
170
+ ">>UNUSED_347<<",
171
+ ">>UNUSED_348<<",
172
+ ">>UNUSED_349<<",
173
+ ">>UNUSED_350<<",
174
+ ">>UNUSED_351<<",
175
+ ">>UNUSED_352<<",
176
+ ">>UNUSED_353<<",
177
+ ">>UNUSED_354<<",
178
+ ">>UNUSED_355<<",
179
+ ">>UNUSED_356<<",
180
+ ">>UNUSED_357<<",
181
+ ">>UNUSED_358<<",
182
+ ">>UNUSED_359<<",
183
+ ">>UNUSED_360<<",
184
+ ">>UNUSED_361<<",
185
+ ">>UNUSED_362<<",
186
+ ">>UNUSED_363<<",
187
+ ">>UNUSED_364<<",
188
+ ">>UNUSED_365<<",
189
+ ">>UNUSED_366<<",
190
+ ">>UNUSED_367<<",
191
+ ">>UNUSED_368<<",
192
+ ">>UNUSED_369<<",
193
+ ">>UNUSED_370<<",
194
+ ">>UNUSED_371<<",
195
+ ">>UNUSED_372<<",
196
+ ">>UNUSED_373<<",
197
+ ">>UNUSED_374<<",
198
+ ">>UNUSED_375<<",
199
+ ">>UNUSED_376<<",
200
+ ">>UNUSED_377<<",
201
+ ">>UNUSED_378<<",
202
+ ">>UNUSED_379<<",
203
+ ">>UNUSED_380<<",
204
+ ">>UNUSED_381<<",
205
+ ">>UNUSED_382<<",
206
+ ">>UNUSED_383<<",
207
+ ">>UNUSED_384<<",
208
+ ">>UNUSED_385<<",
209
+ ">>UNUSED_386<<",
210
+ ">>UNUSED_387<<",
211
+ ">>UNUSED_388<<",
212
+ ">>UNUSED_389<<",
213
+ ">>UNUSED_390<<",
214
+ ">>UNUSED_391<<",
215
+ ">>UNUSED_392<<",
216
+ ">>UNUSED_393<<",
217
+ ">>UNUSED_394<<",
218
+ ">>UNUSED_395<<",
219
+ ">>UNUSED_396<<",
220
+ ">>UNUSED_397<<",
221
+ ">>UNUSED_398<<",
222
+ ">>UNUSED_399<<",
223
+ ">>UNUSED_400<<",
224
+ ">>UNUSED_401<<",
225
+ ">>UNUSED_402<<",
226
+ ">>UNUSED_403<<",
227
+ ">>UNUSED_404<<",
228
+ ">>UNUSED_405<<",
229
+ ">>UNUSED_406<<",
230
+ ">>UNUSED_407<<",
231
+ ">>UNUSED_408<<",
232
+ ">>UNUSED_409<<",
233
+ ">>UNUSED_410<<",
234
+ ">>UNUSED_411<<",
235
+ ">>UNUSED_412<<",
236
+ ">>UNUSED_413<<",
237
+ ">>UNUSED_414<<",
238
+ ">>UNUSED_415<<",
239
+ ">>UNUSED_416<<",
240
+ ">>UNUSED_417<<",
241
+ ">>UNUSED_418<<",
242
+ ">>UNUSED_419<<",
243
+ ">>UNUSED_420<<",
244
+ ">>UNUSED_421<<",
245
+ ">>UNUSED_422<<",
246
+ ">>UNUSED_423<<",
247
+ ">>UNUSED_424<<",
248
+ ">>UNUSED_425<<",
249
+ ">>UNUSED_426<<",
250
+ ">>UNUSED_427<<",
251
+ ">>UNUSED_428<<",
252
+ ">>UNUSED_429<<",
253
+ ">>UNUSED_430<<",
254
+ ">>UNUSED_431<<",
255
+ ">>UNUSED_432<<",
256
+ ">>UNUSED_433<<",
257
+ ">>UNUSED_434<<",
258
+ ">>UNUSED_435<<",
259
+ ">>UNUSED_436<<",
260
+ ">>UNUSED_437<<",
261
+ ">>UNUSED_438<<",
262
+ ">>UNUSED_439<<",
263
+ ">>UNUSED_440<<",
264
+ ">>UNUSED_441<<",
265
+ ">>UNUSED_442<<",
266
+ ">>UNUSED_443<<",
267
+ ">>UNUSED_444<<",
268
+ ">>UNUSED_445<<",
269
+ ">>UNUSED_446<<",
270
+ ">>UNUSED_447<<",
271
+ ">>UNUSED_448<<",
272
+ ">>UNUSED_449<<",
273
+ ">>UNUSED_450<<",
274
+ ">>UNUSED_451<<",
275
+ ">>UNUSED_452<<",
276
+ ">>UNUSED_453<<",
277
+ ">>UNUSED_454<<",
278
+ ">>UNUSED_455<<",
279
+ ">>UNUSED_456<<",
280
+ ">>UNUSED_457<<",
281
+ ">>UNUSED_458<<",
282
+ ">>UNUSED_459<<",
283
+ ">>UNUSED_460<<",
284
+ ">>UNUSED_461<<",
285
+ ">>UNUSED_462<<",
286
+ ">>UNUSED_463<<",
287
+ ">>UNUSED_464<<",
288
+ ">>UNUSED_465<<",
289
+ ">>UNUSED_466<<",
290
+ ">>UNUSED_467<<",
291
+ ">>UNUSED_468<<",
292
+ ">>UNUSED_469<<",
293
+ ">>UNUSED_470<<",
294
+ ">>UNUSED_471<<",
295
+ ">>UNUSED_472<<",
296
+ ">>UNUSED_473<<",
297
+ ">>UNUSED_474<<",
298
+ ">>UNUSED_475<<",
299
+ ">>UNUSED_476<<",
300
+ ">>UNUSED_477<<",
301
+ ">>UNUSED_478<<",
302
+ ">>UNUSED_479<<",
303
+ ">>UNUSED_480<<",
304
+ ">>UNUSED_481<<",
305
+ ">>UNUSED_482<<",
306
+ ">>UNUSED_483<<",
307
+ ">>UNUSED_484<<",
308
+ ">>UNUSED_485<<",
309
+ ">>UNUSED_486<<",
310
+ ">>UNUSED_487<<",
311
+ ">>UNUSED_488<<",
312
+ ">>UNUSED_489<<",
313
+ ">>UNUSED_490<<",
314
+ ">>UNUSED_491<<",
315
+ ">>UNUSED_492<<",
316
+ ">>UNUSED_493<<",
317
+ ">>UNUSED_494<<",
318
+ ">>UNUSED_495<<",
319
+ ">>UNUSED_496<<",
320
+ ">>UNUSED_497<<",
321
+ ">>UNUSED_498<<",
322
+ ">>UNUSED_499<<",
323
+ ">>UNUSED_500<<",
324
+ ">>UNUSED_501<<",
325
+ ">>UNUSED_502<<",
326
+ ">>UNUSED_503<<",
327
+ ">>UNUSED_504<<",
328
+ ">>UNUSED_505<<",
329
+ ">>UNUSED_506<<",
330
+ ">>UNUSED_507<<",
331
+ ">>UNUSED_508<<",
332
+ ">>UNUSED_509<<",
333
+ ">>UNUSED_510<<",
334
+ ">>UNUSED_511<<"
335
+ ],
336
+ "caption_token": "<|CAPTION|>",
337
+ "coord_token": "<|coord|>",
338
+ "det_token": "<|DET|>",
339
+ "detailed_caption_token": "<|DETAILED_CAPTION|>",
340
+ "diffusion_query_token": "<|diffusion_query|>",
341
+ "end_of_diffusion_query_token": "<|end_of_diffusion_query|>",
342
+ "end_of_image_token": "<|end_of_image|>",
343
+ "end_of_query_token": "<|end_of_query|>",
344
+ "end_of_turn_token": "<|end_of_turn|>",
345
+ "end_of_video_token": "<|end_of_video|>",
346
+ "eos_token": "<|end_of_text|>",
347
+ "frame_sep_token": "<|frame_sep|>",
348
+ "image_cls_token": "<|image_cls|>",
349
+ "image_mask_token": "<|image_mask_token|>",
350
+ "image_reg_1_token": "<|image_reg_1|>",
351
+ "image_reg_2_token": "<|image_reg_2|>",
352
+ "image_reg_3_token": "<|image_reg_3|>",
353
+ "image_reg_4_token": "<|image_reg_4|>",
354
+ "image_reg_5_token": "<|image_reg_5|>",
355
+ "image_reg_6_token": "<|image_reg_6|>",
356
+ "image_reg_7_token": "<|image_reg_7|>",
357
+ "image_reg_8_token": "<|image_reg_8|>",
358
+ "image_row_sep_token": "<|image_row_sep|>",
359
+ "image_token": "<|image|>",
360
+ "object_token": "<|object|>",
361
+ "ocr_doc_parser_token": "<|OCR_DOC_PARSER|>",
362
+ "ocr_grounding_token": "<|OCR_GROUNDING|>",
363
+ "ocr_plain_token": "<|OCR_PLAIN|>",
364
+ "pad_token": "<|pad|>",
365
+ "perceive_token": "<|perceive|>",
366
+ "point_ref_seg_token": "<|POINT_REF_SEG|>",
367
+ "pointing_token": "<|POINTING|>",
368
+ "presence_token": "<|presence|>",
369
+ "ref_seg_token": "<|REF_SEG|>",
370
+ "seg_token": "<|seg|>",
371
+ "semantic_seg_task_token": "<|SEMANTIC_SEG_TASK|>",
372
+ "semantic_seg_token": "<|semantic_seg|>",
373
+ "size_token": "<|size|>",
374
+ "start_of_diffusion_query_token": "<|start_of_diffusion_query|>",
375
+ "start_of_image_token": "<|start_of_image|>",
376
+ "start_of_query_token": "<|start_of_query|>",
377
+ "start_of_turn_token": "<|start_of_turn|>",
378
+ "start_of_video_token": "<|start_of_video|>",
379
+ "task_sep_token": "<|task_sep|>"
380
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,102 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "absence_token": "<|absence|>",
3
+ "backend": "tokenizers",
4
+ "caption_token": "<|CAPTION|>",
5
+ "clean_up_tokenization_spaces": true,
6
+ "coord_token": "<|coord|>",
7
+ "det_token": "<|DET|>",
8
+ "detailed_caption_token": "<|DETAILED_CAPTION|>",
9
+ "diffusion_query_token": "<|diffusion_query|>",
10
+ "end_of_diffusion_query_token": "<|end_of_diffusion_query|>",
11
+ "end_of_image_token": "<|end_of_image|>",
12
+ "end_of_query_token": "<|end_of_query|>",
13
+ "end_of_turn_token": "<|end_of_turn|>",
14
+ "end_of_video_token": "<|end_of_video|>",
15
+ "eos_token": "<|end_of_text|>",
16
+ "frame_sep_token": "<|frame_sep|>",
17
+ "image_cls_token": "<|image_cls|>",
18
+ "image_mask_token": "<|image_mask_token|>",
19
+ "image_reg_1_token": "<|image_reg_1|>",
20
+ "image_reg_2_token": "<|image_reg_2|>",
21
+ "image_reg_3_token": "<|image_reg_3|>",
22
+ "image_reg_4_token": "<|image_reg_4|>",
23
+ "image_reg_5_token": "<|image_reg_5|>",
24
+ "image_reg_6_token": "<|image_reg_6|>",
25
+ "image_reg_7_token": "<|image_reg_7|>",
26
+ "image_reg_8_token": "<|image_reg_8|>",
27
+ "image_row_sep_token": "<|image_row_sep|>",
28
+ "image_token": "<|image|>",
29
+ "is_local": true,
30
+ "model_input_names": [
31
+ "input_ids",
32
+ "attention_mask"
33
+ ],
34
+ "model_max_length": 1000000000000000019884624838656,
35
+ "model_specific_special_tokens": {
36
+ "absence_token": "<|absence|>",
37
+ "caption_token": "<|CAPTION|>",
38
+ "coord_token": "<|coord|>",
39
+ "det_token": "<|DET|>",
40
+ "detailed_caption_token": "<|DETAILED_CAPTION|>",
41
+ "diffusion_query_token": "<|diffusion_query|>",
42
+ "end_of_diffusion_query_token": "<|end_of_diffusion_query|>",
43
+ "end_of_image_token": "<|end_of_image|>",
44
+ "end_of_query_token": "<|end_of_query|>",
45
+ "end_of_turn_token": "<|end_of_turn|>",
46
+ "end_of_video_token": "<|end_of_video|>",
47
+ "frame_sep_token": "<|frame_sep|>",
48
+ "image_cls_token": "<|image_cls|>",
49
+ "image_mask_token": "<|image_mask_token|>",
50
+ "image_reg_1_token": "<|image_reg_1|>",
51
+ "image_reg_2_token": "<|image_reg_2|>",
52
+ "image_reg_3_token": "<|image_reg_3|>",
53
+ "image_reg_4_token": "<|image_reg_4|>",
54
+ "image_reg_5_token": "<|image_reg_5|>",
55
+ "image_reg_6_token": "<|image_reg_6|>",
56
+ "image_reg_7_token": "<|image_reg_7|>",
57
+ "image_reg_8_token": "<|image_reg_8|>",
58
+ "image_row_sep_token": "<|image_row_sep|>",
59
+ "image_token": "<|image|>",
60
+ "object_token": "<|object|>",
61
+ "ocr_doc_parser_token": "<|OCR_DOC_PARSER|>",
62
+ "ocr_grounding_token": "<|OCR_GROUNDING|>",
63
+ "ocr_plain_token": "<|OCR_PLAIN|>",
64
+ "pad_token": "<|pad|>",
65
+ "perceive_token": "<|perceive|>",
66
+ "point_ref_seg_token": "<|POINT_REF_SEG|>",
67
+ "pointing_token": "<|POINTING|>",
68
+ "presence_token": "<|presence|>",
69
+ "ref_seg_token": "<|REF_SEG|>",
70
+ "seg_token": "<|seg|>",
71
+ "semantic_seg_task_token": "<|SEMANTIC_SEG_TASK|>",
72
+ "semantic_seg_token": "<|semantic_seg|>",
73
+ "size_token": "<|size|>",
74
+ "start_of_diffusion_query_token": "<|start_of_diffusion_query|>",
75
+ "start_of_image_token": "<|start_of_image|>",
76
+ "start_of_query_token": "<|start_of_query|>",
77
+ "start_of_turn_token": "<|start_of_turn|>",
78
+ "start_of_video_token": "<|start_of_video|>",
79
+ "task_sep_token": "<|task_sep|>"
80
+ },
81
+ "object_token": "<|object|>",
82
+ "ocr_doc_parser_token": "<|OCR_DOC_PARSER|>",
83
+ "ocr_grounding_token": "<|OCR_GROUNDING|>",
84
+ "ocr_plain_token": "<|OCR_PLAIN|>",
85
+ "pad_token": "<|pad|>",
86
+ "perceive_token": "<|perceive|>",
87
+ "point_ref_seg_token": "<|POINT_REF_SEG|>",
88
+ "pointing_token": "<|POINTING|>",
89
+ "presence_token": "<|presence|>",
90
+ "ref_seg_token": "<|REF_SEG|>",
91
+ "seg_token": "<|seg|>",
92
+ "semantic_seg_task_token": "<|SEMANTIC_SEG_TASK|>",
93
+ "semantic_seg_token": "<|semantic_seg|>",
94
+ "size_token": "<|size|>",
95
+ "start_of_diffusion_query_token": "<|start_of_diffusion_query|>",
96
+ "start_of_image_token": "<|start_of_image|>",
97
+ "start_of_query_token": "<|start_of_query|>",
98
+ "start_of_turn_token": "<|start_of_turn|>",
99
+ "start_of_video_token": "<|start_of_video|>",
100
+ "task_sep_token": "<|task_sep|>",
101
+ "tokenizer_class": "TokenizersBackend"
102
+ }