Post
198
I'm unemployed, I have a gaming GPU, and I just published a German LLM.
qwen3-0.6b-german - fine-tuned Qwen3-0.6B in ~40h on an RTX 4070 Ti, using the exact same instruct datasets as the LLäMmlein paper (ACL 2025).
HellaSwag-DE: 0.3111 → 0.3193 ✅
ARC-DE: 0.2352 → 0.2575 ✅
MMlu-DE: 0.3600 → 0.2475 🔻 (alignment tax - known trade-off)
Instruction fine-tuning trades some factual breadth for better reasoning and format following. The model is more useful, even if not better on every metric.
Weights, LoRA adapter, full training script and logs all public.
philipp-zettl/qwen3-0.6b-german
It ain't much, but it's honest work.
qwen3-0.6b-german - fine-tuned Qwen3-0.6B in ~40h on an RTX 4070 Ti, using the exact same instruct datasets as the LLäMmlein paper (ACL 2025).
HellaSwag-DE: 0.3111 → 0.3193 ✅
ARC-DE: 0.2352 → 0.2575 ✅
MMlu-DE: 0.3600 → 0.2475 🔻 (alignment tax - known trade-off)
Instruction fine-tuning trades some factual breadth for better reasoning and format following. The model is more useful, even if not better on every metric.
Weights, LoRA adapter, full training script and logs all public.
philipp-zettl/qwen3-0.6b-german
It ain't much, but it's honest work.