mrgnw/gemma-4-e2b-svelte5

Gemma 4 E2B (5.1B total, 2.3B active MoE) fine-tuned for Svelte 5 component generation. Writes correct Svelte 5 syntax — $state, $derived, $props, onclick, {#snippet} — without falling back to Svelte 4 patterns.

125 tok/s on M4 Pro, 128K native context, 2.7 GB RAM (MLX 4-bit).

Proof of concept — created with help of an LLM and trained against SvelteBench (9 tasks). Performance outside those benchmarks is not guaranteed until we have broader training data and evaluation.

Formats

File Format Size Use case
model.safetensors.* MLX 4-bit 2.5 GB Apple Silicon native (fastest)
gemma-4-e2b-svelte5-Q4_K_M.gguf GGUF Q4_K_M 3.2 GB LM Studio, llama.cpp, Ollama
gemma-4-e2b-svelte5-bf16.gguf GGUF bf16 8.7 GB Full precision GGUF

Use with MLX (Apple Silicon)

pip install mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("mrgnw/gemma-4-e2b-svelte5")

messages = [{"role": "user", "content": "Build a searchable data table"}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True, tokenize=False,
    enable_thinking=False,  # critical — prevents infinite thinking loops
)

response = generate(model, tokenizer, prompt=prompt, max_tokens=1024, verbose=True)

API server (OpenAI-compatible):

mlx_lm.server \
  --model mrgnw/gemma-4-e2b-svelte5 \
  --port 8199 \
  --chat-template-args '{"enable_thinking":false}'

Use with GGUF (LM Studio, llama.cpp, Ollama)

Download gemma-4-e2b-svelte5-Q4_K_M.gguf and load it in LM Studio or any llama.cpp-compatible tool.

Limitations

  • Trained on 9 SvelteBench tasks (2,880 cleaned samples) — component generation only
  • 2.3B active parameters — writes components from prompts, doesn't architect apps
  • Must use enable_thinking=False in MLX chat template or model enters infinite reasoning loops
  • Best paired with a larger model for architecture + this model for component generation

Training

LoRA on mlx-community/gemma-4-e2b-it-4bit using mlx-lm 0.31.2. Rank 64, 32 layers, 3000 iterations, ~65 min on M4 Pro. Val loss 0.027.

Data cleaning was more important than LoRA rank — converted all on:clickonclick, stripped <svelte:options runes={true} />, removed Svelte 4 patterns from training data.

Downloads last month
874
Safetensors
Model size
0.7B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mrgnw/gemma-4-e2b-svelte5

Adapter
(2)
this model