Several issues with the web version

by willopcbeta - opened 20 days ago

Is the web version a multimodal model?

If multi-modal models are not currently supported, is there any plan to add them in the future?
The code used in actual operation is the same as that of Gemma 3n E2B/E4B, yet errors still occur when using Gemma 4 E2B/E4B.

Image:

llm_inference_engine.cc:264] INVALID_ARGUMENT: Calculator::Open() for node "LlmVisionInferenceCalculator" failed: Image models could not be created.
=== Source Location Trace: ===
third_party/odml/infra/genai/inference/ml_drift/llm/llm_gpu_runner_manager.cc:509
third_party/odml/infra/genai/inference/calculators/llm_vision_inference_calculator.cc:276
third_party/mediapipe/framework/calculator_node.cc:603

Sound:

Uncaught (in promise) Error: Failed to create session: INVALID_ARGUMENT: Audio options should not be null when loading audio runner.
=== Source Location Trace: ===
third_party/odml/infra/genai/inference/ml_drift/llm/llm_gpu_runner_manager.cc:570
third_party/odml/infra/genai/inference/llm_engine.cc:980
third_party/odml/infra/genai/inference/llm_engine.cc:2681

Due to translation limitations, short sentences or brief texts often result in no response or incorrect translations.

Example:
Input Text :"Translate the following from en text into concise zh-Hant: "The code used in actual operation is the same as that of Gemma 3n E2B/E4B.".
Provide only the translated text. Do not include any additional explanations, commentary, or greetings."

Output:Output Text:
First time:Output Text: "" . //The response returned a null value.
Second time: Output Text:“The code used in operation is the same as that of Gemma 3n E2B/E4B.”
The correct result should be: “The code used in operation is the same as that of Gemma 3n E2B/E4B.” //Obtained duplicate input content

Every time it returns "Completed", repeating the translation several times will result in errors.
genai_bundle.js:1 Uncaught (in promise) Error: Cannot process because LLM inference engine is currently loading or processing.

tylermullen

LiteRT Community (FKA TFLite) org 20 days ago

Our web conversions of the Gemma 4 models are text-only at the moment. We do not yet have a time estimate for Gemma 4 multimodality support on web, but it's definitely something we're looking into.

Also, the prompt formatting has changed a lot from Gemma 3 (and includes several new features, like function calling and thinking), so make sure to update that as well for any Gemma 4 prompts: https://ai.google.dev/gemma/docs/core/prompt-formatting-gemma4.

willopcbeta

20 days ago

Thank you for your reply amidst your busy schedule. Modifying the Gemma-4 prompt indeed resolved the text conversation issues, and I look forward to the future web release of multimodal models.

Actual testing on the Pixel 8 Pro successfully ran Gemma 4 E2B/E4B + transformers.js unsloth-whisper-small (q4/q4f16) to solve the audio problem, which significantly improved memory usage compared to Gemma 3n.

This response was translated from zh-Hant to English using Gemma 4 E4B.
Actual working website: https://willo83417.github.io/Gemini-AI-Translator/

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment