Gemma 4 E2B LiteRT-LM 128k + MTP
Experimental .litertlm bundle for Gemma 4 E2B derived from the LiteRT community artifact and patched for:
max_num_tokens: 131072- MTP / speculative decoding support retained
- native LiteRT-LM runtime compatibility
Status
What is verified on host 100.96.1.7:
- short-prompt inference works on CPU
- speculative decoding works
- LiteRT-LM logs show
target_number=131072 - long-context prefill above 32k starts correctly without falling back to 32k
What is not claimed yet:
- full production qualification
- GPU qualification
- parity with all Gemma4 sizes
- Qwen qualification
Artifact
- file:
model.litertlm - sha256:
274e5c461e754cbd05423bab734e7765d9757443fb5591edb7aba6f9f186550a - size_bytes:
2584805376
Notes
This is an experimental artifact upload for CTOX integration work. Use with caution.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support