Tokens per Image Parameter?

#51
by buckeye17 - opened

The model card mentions that the model supports multiple values for converting images to tokens, but it does not specify how this value can be specified with the Transformers package. Does anyone know how to do this?

Reading the API documentation for Gemma 4 along with some guessing, I think I found the answer to my question. Instead of using the example line of processor = AutoProcessor.from_pretrained(MODEL_ID), use processor = AutoProcessor.from_pretrained(MODEL_ID, max_soft_tokens=1120). The acceptable values for this argument are 70, 140, 280, 560 and 1120.

Did it this way:
vqa_pipe.image_processor.max_soft_tokens = 1120 # https://ai.google.dev/gemma/docs/capabilities/vision/image#variable_resolution_token_budget

Sign up or log in to comment