Gemma 4 E2B IT q4f16_1 MLC (Experimental)
Experimental text-first MLC/WebLLM packaging of google/gemma-4-E2B-it in q4f16_1 for browser-local WebGPU and MLC-LLM runtimes.
This repository is a custom build from a local mlc-llm / TVM fork. It is not an official mlc-ai release, and runtime validation is still ongoing.
Model Details
- Base model:
google/gemma-4-E2B-it - Quantization:
q4f16_1 - Runtime target:
webgpu - Model type:
gemma4 - Conversation template:
gemma_instruction - Packaged context window:
4096 - Sliding window size:
512 - Prefill chunk size:
1024 - Parameter shards:
42 - Quantized parameter size:
2.453 GB - Total artifact size:
2.676 GB - WebGPU model library:
8.7 MB - Release manifest:
release-manifest.json
q4f16_1 is used here as the default browser-focused MLC target: 4-bit weights with FP16 runtime dtype.
Usage
Chat
mlc_llm chat HF://welcoma/gemma-4-E2B-it-q4f16_1-MLC
WebLLM Integration
import { CreateMLCEngine } from "@mlc-ai/web-llm";
const repo = "https://huggingface.co/welcoma/gemma-4-E2B-it-q4f16_1-MLC";
const appConfig = {
model_list: [
{
model: repo,
model_id: "gemma-4-E2B-it-q4f16_1-MLC",
model_lib: `${repo}/resolve/main/libs/gemma-4-E2B-it-q4f16_1-MLC-webgpu.wasm`,
required_features: ["shader-f16"],
},
],
};
const engine = await CreateMLCEngine("gemma-4-E2B-it-q4f16_1-MLC", {
appConfig,
});
Files
mlc-chat-config.json: MLC runtime configurationtokenizer.json: fast tokenizer datatokenizer_config.json: tokenizer settingsparams_shard_*.bin: quantized parameter shardstensor-cache.json: tensor metadata cacherelease-manifest.json: file inventory with SHA-256 hasheslibs/gemma-4-E2B-it-q4f16_1-MLC-webgpu.wasm: compiled WebGPU model library
Limitations
- This package is experimental. Successful artifact conversion does not yet imply fully stable browser runtime behavior on every device.
- This package is text-first. It does not expose Gemma 4 image or audio features in WebLLM.
- The packaged context window is
4096, even though the base model advertises a much larger native context length. - This build currently depends on a Gemma 4 patchset on top of
mlc-llm; upstream support is still being worked on. - Browser use still requires WebGPU and
shader-f16support. - Built-in WebLLM registration and
mlc-ainamespace publication are intentionally deferred until runtime stability is demonstrated.
License
This repository follows the same license and usage terms as the base model, google/gemma-4-E2B-it.
Model tree for cnhktyom/gemma-4-E2B-it-q4f16_1-MLC
Base model
google/gemma-4-E2B-it