Gemma 3n E2B Swahili Reasoning LoRA - GGUF
GGUF quantized versions of lyimo/gemma3n-E2B-swahili-reasoning-lora for efficient inference on mobile devices and consumer hardware.
Available Quantizations
| File | Quant | Size | Description |
|---|---|---|---|
gemma3n-E2B-swahili-reasoning-Q4_K_M.gguf |
Q4_K_M | 2.6 GB | Recommended for mobile - Best balance of quality and size |
gemma3n-E2B-swahili-reasoning-Q8_0.gguf |
Q8_0 | 4.5 GB | Higher quality, larger size |
Usage
With llama.cpp
./llama-cli -m gemma3n-E2B-swahili-reasoning-Q4_K_M.gguf -p "Your prompt here"
With Ollama
# Create a Modelfile
echo 'FROM ./gemma3n-E2B-swahili-reasoning-Q4_K_M.gguf' > Modelfile
ollama create gemma3n-swahili -f Modelfile
ollama run gemma3n-swahili
Model Details
- Architecture: Gemma3n (6B parameters, E2B variant)
- Base Model: unsloth/gemma-3n-e2b-it-unsloth-bnb-4bit
- Fine-tuned for: Swahili reasoning tasks
- Context Length: 32,768 tokens
- Vocab Size: 262,400
Quantization Details
- Converted from the original safetensors weights using llama.cpp
- Q4_K_M uses mixed 4-bit quantization (4.91 BPW) - ideal for mobile deployment
- Q8_0 uses 8-bit quantization (8.50 BPW) - higher quality for desktop use
- Downloads last month
- 52
Hardware compatibility
Log In to add your hardware
4-bit
8-bit
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Model tree for lyimo/gemma3n-E2B-swahili-reasoning-lora-GGUF
Base model
lyimo/gemma3n-E2B-swahili-reasoning-lora