Instructions to use Keyven/german-ocr-3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Keyven/german-ocr-3 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="Keyven/german-ocr-3")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Keyven/german-ocr-3", dtype="auto")

llama-cpp-python

How to use Keyven/german-ocr-3 with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Keyven/german-ocr-3",
	filename="german-ocr-3-Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": [
				{
					"type": "text",
					"text": "Describe this image in one sentence."
				},
				{
					"type": "image_url",
					"image_url": {
						"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
					}
				}
			]
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use Keyven/german-ocr-3 with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Keyven/german-ocr-3:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf Keyven/german-ocr-3:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Keyven/german-ocr-3:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf Keyven/german-ocr-3:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Keyven/german-ocr-3:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf Keyven/german-ocr-3:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Keyven/german-ocr-3:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Keyven/german-ocr-3:Q4_K_M

Use Docker

docker model run hf.co/Keyven/german-ocr-3:Q4_K_M

LM Studio
Jan

vLLM

How to use Keyven/german-ocr-3 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Keyven/german-ocr-3"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Keyven/german-ocr-3",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/Keyven/german-ocr-3:Q4_K_M

SGLang

How to use Keyven/german-ocr-3 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Keyven/german-ocr-3" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Keyven/german-ocr-3",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Keyven/german-ocr-3" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Keyven/german-ocr-3",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Ollama
How to use Keyven/german-ocr-3 with Ollama:
```
ollama run hf.co/Keyven/german-ocr-3:Q4_K_M
```

Unsloth Studio new

How to use Keyven/german-ocr-3 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Keyven/german-ocr-3 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Keyven/german-ocr-3 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Keyven/german-ocr-3 to start chatting

Pi new

How to use Keyven/german-ocr-3 with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Keyven/german-ocr-3:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "german-ocr-3"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Docker Model Runner
How to use Keyven/german-ocr-3 with Docker Model Runner:
```
docker model run hf.co/Keyven/german-ocr-3:Q4_K_M
```

Lemonade

How to use Keyven/german-ocr-3 with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Keyven/german-ocr-3:Q4_K_M

Run and chat with the model

lemonade run user.german-ocr-3-Q4_K_M

List all available models

lemonade list

Keyven commited on 18 days ago

Commit

c6ab04e

verified ·

1 Parent(s): 5002c50

polish: stats hero, absolute image URLs, improved frontmatter (datasets, base_model, langs, new_version)

Browse files

Files changed (1) hide show

README.md +107 -42

README.md CHANGED Viewed

@@ -38,59 +38,85 @@ new_version: Keyven/german-ocr-3
 ---
 <p align="center">
-  <img src="https://app.german-ocr.de/icon.png" alt="German-OCR-3" width="128" height="128" />
 </p>
 <h1 align="center">German-OCR-3</h1>
-<p align="center"><strong>Deutsche Vision-OCR. Kompakt. Lokal. Open Source.</strong></p>
 <p align="center">
-  <a href="https://german-ocr.de"><img alt="Site" src="https://img.shields.io/badge/site-german--ocr.de-3B82F6?style=flat-square"/></a>
-  <a href="https://ollama.com/Keyvan/german-ocr-3"><img alt="Ollama" src="https://img.shields.io/badge/Ollama-Keyvan%2Fgerman--ocr--3-orange?style=flat-square"/></a>
-  <a href="https://github.com/Keyvanhardani/German-OCR-3-Dev"><img alt="GitHub" src="https://img.shields.io/badge/GitHub-source-black?style=flat-square"/></a>
-  <a href="LICENSE"><img alt="License: Apache 2.0" src="https://img.shields.io/badge/License-Apache_2.0-green?style=flat-square"/></a>
 </p>
 ---
-## Was ist German-OCR-3?
-**German-OCR-3** ist eine kompakte, schnelle und voll lokal lauffähige **Vision-OCR-Distribution für deutsche Geschäftsdokumente** — Rechnungen, Briefe, Formulare, Quittungen, Bescheide. Aus dem Bild kommt **strikt validiertes JSON nach unserem deutschen Extraktions-Schema**, ohne Cloud-Pflicht, ohne Vendor-Lock-in.
-Zwei Editionen, beide Apache-2.0, beide unter 3 GB:
-| Edition | Größe (Ollama) | Zielhardware | Stärke |
-|---|---:|---|---|
-| **`Keyvan/german-ocr-nano`** | **1.0 GB** | CPU / Edge / Mobile | „läuft überall" |
-| **`Keyvan/german-ocr-3` ⭐** | **2.7 GB** | 4–6 GB VRAM | empfohlene Default-Edition |
-⭐ Auf unserem Praxistest mit **200+ echten anonymisierten deutschen Rechnungen**: **100 % gültiges JSON · 95 % Sender korrekt erkannt · 0 % Halluzination**.
-> **Fine-tuned adapter** für deutsche Geschäftsdokument-Extraktion. Apache 2.0.
-## Trainings- und Evaluations-Datensätze
-* [`neuralabs/german-synth-ocr`](https://huggingface.co/datasets/neuralabs/german-synth-ocr) — 4 500+ deutsche OCR-Samples (synthetisch, Apache-2.0)
-* `Aoschu/German_invoices_dataset_for_donut` — 129 echte deutsche Rechnungen (Donut-Format)
-* **Eigenes synthetisches DE-Rechnungs-Set** — 100 Rechnungen mit Golden-JSON, deterministisch generiert (`eval/fixtures/synth_de_invoices/`)
-* **Anonymisierter DACH-Praxistest** — echte Rechnungen verschiedener DACH-Anbieter, intern, nicht im Repo (DSGVO)
 ---
-## Größenvergleich
-![Modellgrößen](charts/01_size_vs_competitors.png)
-`german-ocr-3` (2.7 GB) ist **6× kleiner** als ein typischer 7B-OCR-VLM, läuft auf einer **8 GB-Gaming-GPU** oder über CPU auf einem normalen Laptop.
-![Praxistest](charts/02_ionos_validity.png)
-![Latenz](charts/04_latency.png)
 ---
-## Quickstart
 ### Ollama (empfohlen, eine Zeile)
@@ -99,7 +125,8 @@ ollama pull Keyvan/german-ocr-3
 ollama run Keyvan/german-ocr-3 "Extrahiere die Rechnung im Bild als JSON." ./meine_rechnung.png
 ```
-Erwartetes Ergebnis (echter Output):
 ```json
 {
@@ -131,6 +158,8 @@ Erwartetes Ergebnis (echter Output):
 }
 ```
 ### Python (via Ollama HTTP API)
 ```python
@@ -166,40 +195,66 @@ llama-cli -m ./german-ocr-3/german-ocr-3-Q4_K_M.gguf \
 ---
-## Zielgruppen
-* **Solo-Builder & Indies** die deutsche Dokumente lokal extrahieren wollen, ohne Cloud-OCR-Kosten.
-* **DACH-KMU** mit Datenschutz-Anspruch, die lokal/on-prem hosten wollen.
-* **Agenturen & Studios** die ein Open-Source-Fundament unter ihrer eigenen Pipeline wollen.
-Wer es **gemanagt** und mit größeren Modellen will:
 > 🌐 **[german-ocr.de](https://german-ocr.de)** — gehostete deutsche OCR-API mit Premium-Modellen, höherer Genauigkeit, ohne eigene Hardware. Daten bleiben in der EU.
-## Lizenz
-**Apache License 2.0** für die gesamte German-OCR-3-Distribution (Modelfiles, System-Prompt, Schemas, Docs, GGUFs).
-## Credit & Attribution
 German-OCR-3 baut auf der hervorragenden Arbeit des **Qwen-Teams bei Alibaba Group** auf. Die zugrundeliegende Vision-Language-Architektur stammt aus der **Qwen 3.5 Small Series**, veröffentlicht unter Apache License 2.0. Ohne die offene Forschung und die saubere Veröffentlichung der Qwen-Weights wäre dieses Projekt nicht möglich.
-* **Qwen 3.5** — https://huggingface.co/Qwen · https://qwen.ai
-* **Apache License 2.0** (Weights) — © 2025–2026 Qwen Team, Alibaba Group
-* **Qwen2.5-VL Technical Report** — arXiv:2502.13923
 Vollständiger Attribution-Text in [`NOTICE`](NOTICE).
-## Zitation
-Wenn du German-OCR-3 in Forschung oder Produktion verwendest, zitiere bitte beides — unsere Distribution und die Qwen-Basisarbeit:
 ```bibtex
 @misc{german_ocr_3_2026,
   title  = {German-OCR-3: A compact German document-OCR distribution},
   author = {Hardani, Keyvan},
   year   = {2026},
-  url    = {https://github.com/Keyvanhardani/German-OCR}
 }
 @misc{qwen35_2026,
@@ -216,4 +271,14 @@ Wenn du German-OCR-3 in Forschung oder Produktion verwendest, zitiere bitte beid
   journal = {arXiv preprint arXiv:2502.13923},
   year    = {2025}
 }
-```

 ---
 <p align="center">
+  <img src="https://app.german-ocr.de/icon.png" alt="German-OCR-3" width="140" height="140" />
 </p>
 <h1 align="center">German-OCR-3</h1>
+<p align="center"><strong>Deutsche Vision-OCR. Kompakt. Lokal. Open Source.</strong><br/>
+<sub>Aus deutschem Dokument-Bild → strikt validiertes JSON. In unter 60 Sekunden lokal lauffähig.</sub></p>
 <p align="center">
+  <a href="https://german-ocr.de"><img alt="Site" src="https://img.shields.io/badge/site-german--ocr.de-3B82F6?style=flat-square&labelColor=0B1220"/></a>
+  <a href="https://ollama.com/Keyvan/german-ocr-3"><img alt="Ollama" src="https://img.shields.io/badge/Ollama-Keyvan%2Fgerman--ocr--3-F59E0B?style=flat-square&labelColor=0B1220"/></a>
+  <a href="https://github.com/Keyvanhardani/German-OCR-3-Dev"><img alt="GitHub" src="https://img.shields.io/badge/GitHub-source-181717?style=flat-square&labelColor=0B1220"/></a>
+  <a href="#license"><img alt="License: Apache 2.0" src="https://img.shields.io/badge/License-Apache_2.0-22C55E?style=flat-square&labelColor=0B1220"/></a>
+  <img alt="Language" src="https://img.shields.io/badge/lang-Deutsch-3B82F6?style=flat-square&labelColor=0B1220"/>
+  <img alt="Hallucination" src="https://img.shields.io/badge/Halluzination-0%25-22C55E?style=flat-square&labelColor=0B1220"/>
 </p>
 ---
+## ⚡ At a glance
+<table align="center">
+  <tr>
+    <td align="center" width="180"><h2>100 %</h2><sub>Gültiges JSON</sub></td>
+    <td align="center" width="180"><h2>95 %</h2><sub>Sender korrekt</sub></td>
+    <td align="center" width="180"><h2>0 %</h2><sub>Halluzination</sub></td>
+    <td align="center" width="180"><h2>5.0 s</h2><sub>Latenz / Doc</sub></td>
+  </tr>
+</table>
+<p align="center"><sub>Auf <strong>200+ echten anonymisierten deutschen Rechnungen</strong> (Default-Edition, 2.7 GB)</sub></p>
+---
+## Was ist German-OCR-3?
+**German-OCR-3** ist eine kompakte, schnelle und voll lokal lauffähige **Vision-OCR-Distribution für deutsche Geschäftsdokumente** — Rechnungen, Briefe, Formulare, Quittungen, Bescheide. Aus dem Bild kommt **strikt validiertes JSON** nach unserem deutschen Extraktions-Schema. Ohne Cloud-Pflicht, ohne Vendor-Lock-in.
+Zwei Editionen, beide Apache 2.0, beide unter 3 GB:
+| Edition | Ollama | Größe | Zielhardware | Stärke |
+|---|---|---:|---|---|
+| **Nano** | `Keyvan/german-ocr-nano` | **1.0 GB** | CPU · Edge · Mobile | „läuft überall" |
+| **Default** ⭐ | `Keyvan/german-ocr-3` | **2.7 GB** | 4–6 GB VRAM | beste Field-Erkennung |
+> **Fine-tuned adapter** für deutsche Geschäftsdokument-Extraktion. Apache 2.0.
 ---
+## 📊 Praxistest — 200+ echte deutsche Rechnungen (anonymisiert)
+<p align="center">
+  <img src="https://huggingface.co/Keyven/german-ocr-3/resolve/main/charts/02_ionos_validity.png" alt="Praxistest" width="820"/>
+</p>
+| Edition | Valid JSON | Sender korrekt | **Halluzination** | Latenz |
+|---|---:|---:|---:|---:|
+| `Keyvan/german-ocr-nano` | 84 % | 79 % | **0 %** | 6.6 s |
+| **`Keyvan/german-ocr-3`** ⭐ | **100 %** | **95 %** | **0 %** | **5.0 s** |
+**Keine "Mustermann"-Defaults.** German-OCR-3 liest echte Firma, Kundenadresse, Produkte, Beträge — statt zu raten.
+---
+## 📐 Größenvergleich
+<p align="center">
+  <img src="https://huggingface.co/Keyven/german-ocr-3/resolve/main/charts/01_size_vs_competitors.png" alt="Modellgrößen" width="820"/>
+</p>
+`german-ocr-3` (2.7 GB) ist **6× kleiner** als ein typischer 7B-OCR-VLM. Läuft auf einer **8 GB-Gaming-GPU** oder über CPU auf einem normalen Laptop.
+<p align="center">
+  <img src="https://huggingface.co/Keyven/german-ocr-3/resolve/main/charts/04_latency.png" alt="Latenz" width="620"/>
+</p>
 ---
+## 🚀 Quickstart
 ### Ollama (empfohlen, eine Zeile)
 ollama run Keyvan/german-ocr-3 "Extrahiere die Rechnung im Bild als JSON." ./meine_rechnung.png
 ```
+<details>
+<summary><b>Beispiel-Output (anonymisiert, aus Praxistest)</b> — klicken zum Aufklappen</summary>
 ```json
 {
 }
 ```
+</details>
 ### Python (via Ollama HTTP API)
 ```python
 ---
+## 📚 Trainings- und Evaluations-Datensätze
+| Datensatz | Umfang | Typ |
+|---|---|---|
+| [`neuralabs/german-synth-ocr`](https://huggingface.co/datasets/neuralabs/german-synth-ocr) | 4 500+ | Deutsche OCR-Samples (synthetisch, Apache-2.0) |
+| [`Aoschu/German_invoices_dataset_for_donut`](https://huggingface.co/datasets/Aoschu/German_invoices_dataset_for_donut) | 129 | Echte deutsche Rechnungen (Donut-Format) |
+| Eigenes synthetisches DE-Rechnungs-Set | 100 | Rechnungen mit Golden-JSON, deterministisch generiert |
+| Anonymisierter DACH-Praxistest | 200+ | Echte Rechnungen verschiedener DACH-Anbieter (intern, DSGVO) |
+---
+## 🎯 Zielgruppen
+- **Solo-Builder & Indies** — deutsche Dokumente lokal extrahieren, ohne Cloud-OCR-Kosten.
+- **DACH-KMU mit Datenschutz-Anspruch** — lokal / on-prem hosten.
+- **Agenturen & Studios** — Open-Source-Fundament unter der eigenen Pipeline.
+Wer es **gemanagt** und mit noch größeren Modellen will:
 > 🌐 **[german-ocr.de](https://german-ocr.de)** — gehostete deutsche OCR-API mit Premium-Modellen, höherer Genauigkeit, ohne eigene Hardware. Daten bleiben in der EU.
+---
+## ⚠️ Limitations
+- Optimiert für **deutsche** Dokumente — andere Sprachen keine Garantie.
+- Beste Qualität bei klaren, hochauflösenden Scans/Fotos.
+- Handschriftliche Dokumente: nur begrenzt.
+- Bei kritischen Vorgängen (Buchhaltung, Recht): **immer Human-in-the-Loop**.
+---
+## 🙏 Credit & Attribution
 German-OCR-3 baut auf der hervorragenden Arbeit des **Qwen-Teams bei Alibaba Group** auf. Die zugrundeliegende Vision-Language-Architektur stammt aus der **Qwen 3.5 Small Series**, veröffentlicht unter Apache License 2.0. Ohne die offene Forschung und die saubere Veröffentlichung der Qwen-Weights wäre dieses Projekt nicht möglich.
+- **Qwen 3.5** — https://huggingface.co/Qwen · https://qwen.ai
+- **Apache License 2.0** (Weights) — © 2025–2026 Qwen Team, Alibaba Group
+- **Qwen2.5-VL Technical Report** — [arXiv:2502.13923](https://arxiv.org/abs/2502.13923)
 Vollständiger Attribution-Text in [`NOTICE`](NOTICE).
+---
+## <a id="license"></a>📄 License
+**Apache License 2.0** für die gesamte German-OCR-3-Distribution (Modelfiles, System-Prompt, Schemas, Docs, GGUFs).
+---
+## 📑 Citation
+Wenn du German-OCR-3 in Forschung oder Produktion verwendest, zitiere bitte **beides** — unsere Distribution und die Qwen-Basisarbeit:
 ```bibtex
 @misc{german_ocr_3_2026,
   title  = {German-OCR-3: A compact German document-OCR distribution},
   author = {Hardani, Keyvan},
   year   = {2026},
+  url    = {https://github.com/Keyvanhardani/German-OCR-3-Dev}
 }
 @misc{qwen35_2026,
   journal = {arXiv preprint arXiv:2502.13923},
   year    = {2025}
 }
+```
+---
+## 👤 Author
+**Keyvan Hardani**
+· Website: [keyvan.ai](https://keyvan.ai)
+· LinkedIn: [linkedin.com/in/keyvanhardani](https://linkedin.com/in/keyvanhardani)
+· GitHub: [@Keyvanhardani](https://github.com/Keyvanhardani)
+· Hosted Premium: [german-ocr.de](https://german-ocr.de)