MichaelWelsch
/

gemma-4-E2B-it-litert-community-128k-mtp

Model card Files Files and versions

Gemma 4 E2B LiteRT-LM 128k + MTP

Experimental .litertlm bundle for Gemma 4 E2B derived from the LiteRT community artifact and patched for:

max_num_tokens: 131072
MTP / speculative decoding support retained
native LiteRT-LM runtime compatibility

Status

What is verified on host 100.96.1.7:

short-prompt inference works on CPU
speculative decoding works
LiteRT-LM logs show target_number=131072
long-context prefill above 32k starts correctly without falling back to 32k

What is not claimed yet:

full production qualification
GPU qualification
parity with all Gemma4 sizes
Qwen qualification

Artifact

file: model.litertlm
sha256: 274e5c461e754cbd05423bab734e7765d9757443fb5591edb7aba6f9f186550a
size_bytes: 2584805376

Notes

This is an experimental artifact upload for CTOX integration work. Use with caution.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support