Tokens per Image Parameter?

#51

by buckeye17 - opened 11 days ago

The model card mentions that the model supports multiple values for converting images to tokens, but it does not specify how this value can be specified with the Transformers package. Does anyone know how to do this?

buckeye17

10 days ago

Reading the API documentation for Gemma 4 along with some guessing, I think I found the answer to my question. Instead of using the example line of processor = AutoProcessor.from_pretrained(MODEL_ID), use processor = AutoProcessor.from_pretrained(MODEL_ID, max_soft_tokens=1120). The acceptable values for this argument are 70, 140, 280, 560 and 1120.

jbarth-ubhd

10 days ago

•

edited 10 days ago

Did it this way:
vqa_pipe.image_processor.max_soft_tokens = 1120 # https://ai.google.dev/gemma/docs/capabilities/vision/image#variable_resolution_token_budget

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment