---
license: eupl-1.2
pipeline_tag: image-text-to-text
library_name: gguf
base_model:
- LetheanNetwork/lemer
base_model_relation: finetune
tags:
- gemma4
- lemma
- gguf
- llama.cpp
- ollama
- multimodal
- vision
- audio
- on-device
- conversational
---
# Lemer — Gemma 4 E2B (GGUF)
The smallest member of the [Lemma model family](https://huggingface.co/collections/lthn/lemma) by [Lethean](https://lthn.ai). An EUPL-1.2 fork of [Gemma 4 E2B](https://huggingface.co/google/gemma-4-E2B-it) with the **Lethean Ethical Kernel (LEK) merged into the weights** — consent-based reasoning baked into the attention projections via LoRA finetune, then merged so inference uses a single standalone model with no PEFT runtime required.
This repo ships the **GGUF multi-quant build** — five quants from Q3_K_M up to BF16, with full multimodal support (text, image, audio). Use with Ollama, llama.cpp, GPT4All, or LM Studio. The unmodified Gemma 4 E2B fork lives at [LetheanNetwork/lemer](https://huggingface.co/LetheanNetwork/lemer) for users who want the raw Google weights without the LEK shift.
**Looking for MLX?** The native Apple Silicon builds live in sibling repos:
[`lthn/lemer-mlx`](https://huggingface.co/lthn/lemer-mlx) (4-bit default) |
[`lthn/lemer-mlx-8bit`](https://huggingface.co/lthn/lemer-mlx-8bit) |
[`lthn/lemer-mlx-bf16`](https://huggingface.co/lthn/lemer-mlx-bf16) (full precision)
> A **lemma** is "something assumed" — an intermediate theorem on the path to a larger proof, or a heading that signals the subject of what follows. The Lemma model family is named for that role: each variant is a stepping stone between raw capability and ethical application.
## GGUF Variants
| File | Quant | Size | Use Case |
|------|-------|------|----------|
| `lemer-q3_k_m.gguf` | Q3_K_M | 3.0 GB | Minimum viable — constrained devices |
| `lemer-q4_k_m.gguf` | Q4_K_M | 3.2 GB | **Recommended** — best size/quality balance |
| `lemer-q5_k_m.gguf` | Q5_K_M | 3.4 GB | Higher quality, moderate size |
| `lemer-q6_k.gguf` | Q6_K | 3.6 GB | Near-lossless |
| `lemer-q8_0.gguf` | Q8_0 | 4.6 GB | Maximum quality quantised |
| `lemer-bf16.gguf` | BF16 | 8.7 GB | Full precision reference |
All quants verified locally via Ollama and llama-cpp-python. For native Apple Silicon use [`lthn/lemer-mlx`](https://huggingface.co/lthn/lemer-mlx) instead.
### Repo Files
| File | Format | Purpose |
|------|--------|---------|
| `lemer-*.gguf` | GGUF | Ollama, llama.cpp, GPT4All, LM Studio |
| `config.json` | JSON | Multimodal model config (architecture, quantisation, vision/audio towers) |
| `tokenizer.json` | JSON | Tokenizer vocabulary (262K tokens) |
| `tokenizer_config.json` | JSON | Tokenizer settings and special tokens |
| `chat_template.jinja` | Jinja2 | Chat template |
| `processor_config.json` | JSON | Image and audio processor config |
| `generation_config.json` | JSON | Default generation parameters (temperature, top_p, top_k) |
| `template` | Go template | Ollama chat template override |
| `params` | JSON | Ollama sampling parameters |
| `LICENSE` | Text | EUPL-1.2 licence text |
| `README.md` | Markdown | This file — model card |
## Quick Start
### Apps & CLI
Ollama
```bash
ollama run hf.co/lthn/lemer:Q4_K_M
```
Docker
```bash
docker model run hf.co/lthn/lemer
```
Or from Docker Hub:
```bash
docker model run lthn/lemer
```
Unsloth Studio
```bash
# macOS / Linux / WSL
curl -fsSL https://unsloth.ai/install.sh | sh
# Windows
irm https://unsloth.ai/install.ps1 | iex
```
```bash
unsloth studio -H 0.0.0.0 -p 8888
# Open http://localhost:8888 — search for lthn/lemer
```
Or use [HuggingFace Spaces](https://huggingface.co/spaces/unsloth/studio) — no install, search for `lthn/lemer`.
llama.cpp
Install via brew (macOS/Linux), winget (Windows), or build from source:
```bash
brew install llama.cpp # macOS/Linux
winget install llama.cpp # Windows
```
```bash
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf lthn/lemer:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf lthn/lemer:Q4_K_M
```
Or build from source:
```bash
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
./build/bin/llama-server -hf lthn/lemer:Q4_K_M
./build/bin/llama-cli -hf lthn/lemer:Q4_K_M
```
> **MLX users:** this repo ships gguf only. For native Apple Silicon use [`lthn/lemer-mlx`](https://huggingface.co/lthn/lemer-mlx) (4-bit), [`lthn/lemer-mlx-8bit`](https://huggingface.co/lthn/lemer-mlx-8bit), or [`lthn/lemer-mlx-bf16`](https://huggingface.co/lthn/lemer-mlx-bf16).
### Python Libraries
llama-cpp-python
```bash
uv pip install llama-cpp-python
```
```python
from llama_cpp import Llama
llm = Llama.from_pretrained(
repo_id="lthn/lemer",
filename="lemer-q4_k_m.gguf",
)
# Text
llm.create_chat_completion(
messages=[{"role": "user", "content": "Hello, how are you?"}]
)
# Vision (multimodal)
llm.create_chat_completion(
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Describe this image in one sentence."},
{
"type": "image_url",
"image_url": {
"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
}
}
]
}
]
)
```
### Servers (OpenAI-compatible API)
llama-server (llama.cpp)
```bash
brew install llama.cpp # macOS/Linux
llama-server -hf lthn/lemer:Q4_K_M
```
Works with any OpenAI-compatible client at `http://localhost:8080/v1`.
vLLM
> vLLM requires the original (non-quantised) safetensors weights from [LetheanNetwork/lemer](https://huggingface.co/LetheanNetwork/lemer) — it does not load GGUF or MLX-quantised safetensors. Linux + NVIDIA GPU.
```bash
uv pip install vllm
vllm serve "LetheanNetwork/lemer"
```
```bash
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "LetheanNetwork/lemer",
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "Describe this image in one sentence."},
{
"type": "image_url",
"image_url": {
"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
}
}
]
}
]
}'
```
### Native engines (work in progress)
lemma.cpp (native C++ inference)
```bash
git clone https://github.com/LetheanNetwork/lemma.cpp.git
cd lemma.cpp
cmake -B build
cmake --build build -j
```
lemma.cpp uses Google's `.sbs` (single-file binary) weight format, distinct from safetensors and GGUF. Pre-converted `.sbs` weights for the Lemma family are not yet published — track progress at [LetheanNetwork/lemma.cpp](https://github.com/LetheanNetwork/lemma.cpp).
Once `.sbs` weights are available, run `./build/gemma --weights lemer.sbs` for interactive mode.
lemma (JAX inference)
```bash
uv pip install -e git+https://github.com/LetheanNetwork/lemma.git
```
```python
from lemma import lem
model = lem.nn.Gemma4_E2B()
params = lem.ckpts.load_params("path/to/orbax/checkpoint")
sampler = lem.text.ChatSampler(model=model, params=params, multi_turn=True)
output = sampler.chat("Hello, how are you?")
print(output)
```
> **Note:** lemma's `load_params` requires Google's Orbax checkpoint format (sharded `ocdbt` files), not the GGUF in this repo. Orbax weights for the Lemma family are not yet published. For inference today, use GGUF (`Ollama` / `llama.cpp`) above or MLX via [`lthn/lemer-mlx`](https://huggingface.co/lthn/lemer-mlx).
### Integrations
pi-coding-agent
First start a llama-server (see above), then:
```bash
npm install -g @mariozechner/pi-coding-agent
```
Add to `~/.pi/agent/models.json`:
```json
{
"providers": {
"llama-cpp": {
"baseUrl": "http://localhost:8080/v1",
"api": "openai-completions",
"apiKey": "none",
"models": [
{
"id": "lthn/lemer"
}
]
}
}
}
```
The model `id` should match what `llama-server` reports at `/v1/models`.
## Model Details
| Property | Value |
|----------|-------|
| **Architecture** | Gemma 4 E2B |
| **Total Parameters** | 5.1B total, 2.3B effective (Per-Layer Embeddings) |
| **Layers** | 35 |
| **Context Length** | 128K tokens |
| **Vocabulary** | 262K tokens |
| **Modalities** | Text, Image, Audio |
| **Sliding Window** | 512 tokens |
| **Vision Encoder** | ~150M params |
| **Audio Encoder** | ~300M params |
| **Base Model** | [LetheanNetwork/lemer](https://huggingface.co/LetheanNetwork/lemer) |
| **Licence** | EUPL-1.2 |
## The Lemma Family
| Name | Source (BF16 weights) | Params | Context | Modalities | Consumer Repo |
|------|----------------------|--------|---------|------------|---------------|
| **Lemer** | [LetheanNetwork/lemer](https://huggingface.co/LetheanNetwork/lemer) | 2.3B eff | 128K | Text, Image, Audio | You are here |
| **Lemma** | [LetheanNetwork/lemma](https://huggingface.co/LetheanNetwork/lemma) | 4.5B eff | 128K | Text, Image, Audio | [lthn/lemma](https://huggingface.co/lthn/lemma) |
| **Lemmy** | [LetheanNetwork/lemmy](https://huggingface.co/LetheanNetwork/lemmy) | 3.8B active | 256K | Text, Image | [lthn/lemmy](https://huggingface.co/lthn/lemmy) |
| **Lemrd** | [LetheanNetwork/lemrd](https://huggingface.co/LetheanNetwork/lemrd) | 30.7B | 256K | Text, Image | [lthn/lemrd](https://huggingface.co/lthn/lemrd) |
## Capabilities
- Configurable thinking mode (`<|think|>` token in system prompt enables it; off by default in our examples via `enable_thinking=False`)
- Native function calling and system prompt support
- Variable aspect ratio image understanding
- Audio speech recognition and translation (ASR/AST)
- Multilingual support (140+ languages)
- Hybrid attention (sliding window + global)
## Roadmap
This release of `lemer` is **Gemma 4 E2B with the Lethean Ethical Kernel (LEK) merged in** — axiom-based reasoning baked into the attention weights via LoRA finetune, then merged into the base so inference uses a single standalone model with no PEFT runtime required. The unmodified Gemma 4 E2B fork lives at [LetheanNetwork/lemer](https://huggingface.co/LetheanNetwork/lemer) for users who want the raw Google weights without the LEK shift.
| Phase | Status | What it adds |
|-------|--------|--------------|
| **Base fork** ([LetheanNetwork/lemer](https://huggingface.co/LetheanNetwork/lemer)) | ✅ Released | EUPL-1.2 fork of Gemma 4 E2B — unmodified Google weights |
| **LEK merged** (this repo) | ✅ Released | Lethean Ethical Kernel — axiom-based reasoning via LoRA merge |
| **Lemma family roll-out** | ✅ Released | [lthn/lemma](https://huggingface.co/lthn/lemma), [lthn/lemrd](https://huggingface.co/lthn/lemrd), [lthn/lemmy](https://huggingface.co/lthn/lemmy) — all four variants now LEK-merged |
| **8-PAC eval results** | 🚧 In progress | Continuous benchmarking on the homelab, published to [lthn/LEM-benchmarks](https://huggingface.co/datasets/lthn/LEM-benchmarks) |
The LEK axioms are public domain and published at [Snider/ai-ethics](https://github.com/Snider/ai-ethics). Track research progress at [LetheanNetwork](https://github.com/LetheanNetwork) and the [LEM-research dataset](https://huggingface.co/datasets/lthn/LEM-research).
## Why EUPL-1.2
Lemer is licensed under the [European Union Public Licence v1.2](https://joinup.ec.europa.eu/collection/eupl/eupl-text-eupl-12) — not Apache 2.0 or MIT. This is a deliberate choice:
- **23 official languages, one legal meaning.** EUPL is the only OSS licence designed by lawmakers across multiple legal systems. "Derivative work" means the same thing in German, French, Estonian, and Maltese law.
- **Copyleft with compatibility.** Modifications must be shared back, but the licence plays cleanly with GPL, LGPL, MPL, and other major OSS licences. No accidental relicensing.
- **No proprietary capture.** Anyone can use lemer commercially — but they cannot fork it, train a competitor model on it, and close-source the result. The ethical layer stays in the open.
- **Built for institutions.** Government, research, and enterprise users get a licence designed for cross-border compliance, not a US-centric one.
## Recommended Sampling
Use Google's standardised settings across all use cases:
| Parameter | Value |
|-----------|-------|
| `temperature` | 1.0 |
| `top_p` | 0.95 |
| `top_k` | 64 |
| `stop` | ``, `` |
> Gemma 4 is calibrated for `temperature: 1.0` — this is **not** the same as the typical 0.7 default for other models. Lower values reduce diversity without improving quality. These defaults are pre-configured in the `params` file (Ollama) and `generation_config.json` (transformers).
## Variable Image Resolution
Gemma 4 supports a configurable visual token budget that controls how many tokens represent each image. Higher = more detail, lower = faster inference.
| Token Budget | Use Case |
|--------------|----------|
| 70 | Classification, captioning, video frame processing |
| 140 | General image understanding |
| **280** | Default — balanced quality and speed |
| 560 | OCR, document parsing, fine-grained detail |
| 1120 | Maximum detail (small text, complex documents) |
For multimodal prompts, place image and audio content **before** text for best results.
The default budget (`280`) is set in `processor_config.json` via `image_seq_length` and `max_soft_tokens`. Override per call by adjusting those fields, or by passing explicit `image_seq_length` to the processor where supported.
## Audio (E2B)
E2B supports speech recognition (ASR) and speech translation (AST) up to 30 seconds per clip. Audio longer than 30 seconds should be split into chunks before inference.
Audio input works through GGUF multimodal-capable runners (llama.cpp server with the vision/audio build, or llama-cpp-python multimodal). For a ready-made multimodal Python path today, use the MLX sibling repo [`lthn/lemer-mlx`](https://huggingface.co/lthn/lemer-mlx) with `mlx-vlm` — see that repo's README for the `mlx_vlm.load()` / `mlx_vlm.generate()` pattern.
## Benchmarks
Live evaluation results published to the [LEM-benchmarks dataset](https://huggingface.co/datasets/lthn/LEM-benchmarks). The lemer-specific results live at [LEM-benchmarks/results/lemer](https://huggingface.co/datasets/lthn/LEM-benchmarks/tree/main/results/lemer).
The 8-PAC eval pipeline runs continuously on our homelab and publishes results as they complete. Categories: ethics, reasoning, instruction-following, coding, multilingual, safety, knowledge, creativity.
## Resources
| Resource | Link |
|----------|------|
| **Benchmark results** | [lthn/LEM-benchmarks](https://huggingface.co/datasets/lthn/LEM-benchmarks) |
| **LiveBench results** | [lthn/livebench](https://huggingface.co/datasets/lthn/livebench) |
| **Research notes** | [lthn/LEM-research](https://huggingface.co/datasets/lthn/LEM-research) |
| **Lemma model collection** | [lthn/lemma](https://huggingface.co/collections/lthn/lemma) |
## About Lethean
[Lethean](https://lthn.ai) is a social enterprise building ethical AI infrastructure. The Lemma model family is part of the [LEM (Lethean Ethical Model)](https://github.com/LetheanNetwork) project — training protocol and tooling for intrinsic ethical alignment of language models.
- Website: [lthn.ai](https://lthn.ai)
- GitHub: [LetheanNetwork](https://github.com/LetheanNetwork)
- Licence: [EUPL-1.2](https://joinup.ec.europa.eu/collection/eupl/eupl-text-eupl-12)