Klovis-144M — French Language Model
A 144M-parameter French language model fully designed, implemented, and trained by Eric Houzelle. Every component — architecture, training pipeline, and inference engine — was written in PyTorch without relying on any pre-trained weights or third-party model code.
Klovis demonstrates that a single engineer can deliver a complete, modern Transformer with state-of-the-art architectural components, trained end-to-end on a single NVIDIA L40S GPU for a total compute budget of approximately €50.
Key Facts
| Parameters | 144M |
| Architecture | Decoder-only Transformer |
| Language | French |
| Tokenizer | CamemBERT (camembert-base, 32k vocab) |
| Context window | 256 tokens |
| Chat format | ChatML |
| Training hardware | 1× NVIDIA L40S |
| Total training cost | ~€50 |
| Pre-training data | ~26M French texts (Wikipedia FR + FineWeb-2) |
| SFT data | 6 curated French conversational datasets |
| SFT epochs | 15 |
| License | Apache 2.0 |
| Author | Eric Houzelle |
Quick Start
Installation
pip install transformers torch safetensors sentencepiece
Text Generation
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "Klovis-ai/Klovis-144M-french"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
prompt = "La France est un pays"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
inputs["input_ids"],
max_new_tokens=100,
temperature=0.7,
top_p=0.9,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Conversational Mode (ChatML)
The model was fine-tuned with ChatML formatting for assistant-style interactions:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "Klovis-ai/Klovis-144M-french"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
prompt = (
"<|system|>\n"
"Tu es un assistant utile et concis. Réponds en français.<|end|>\n"
"<|user|>\n"
"Quelle est la capitale de la France ?<|end|>\n"
"<|assistant|>\n"
)
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
inputs["input_ids"],
max_new_tokens=150,
temperature=0.7,
top_p=0.9,
do_sample=True,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=False))
Special Tokens
| Token | Role |
|---|---|
<|system|> |
Start of system message |
<|user|> |
Start of user message |
<|assistant|> |
Start of assistant response |
<|end|> |
End of turn |
Architecture
Klovis implements a decoder-only Transformer using the same building blocks found in LLaMA, Mistral, and Gemma — scaled down to a compact 144M-parameter footprint:
| Component | Detail |
|---|---|
| Embedding dim | 768 |
| Transformer layers | 14 |
| Query heads | 12 |
| KV heads (GQA) | 4 |
| FFN hidden dim | 3072 |
| FFN activation | SwiGLU |
| Normalization | RMSNorm (pre-norm) |
| Position encoding | RoPE (Rotary Position Embedding) |
| Weight tying | Input embeddings ↔ output projection |
Grouped-Query Attention (GQA): 12 query heads share 4 KV heads, reducing KV-cache memory by 3× while preserving attention capacity.
Advanced Feature: Recurrent-Depth Transformer (RDT)
The codebase also implements an experimental Recurrent-Depth Transformer mode (inspired by OpenMythos/Parcae, Prairie et al. 2026), where a single Transformer block is applied iteratively:
Input → [Prelude Layers] → [Shared Block × T steps] → [Coda Layers] → Output
RDT components:
- LTI Injection — Linear Time-Invariant state coupling with guaranteed spectral stability
- Adaptive Computation Time (ACT) — learned per-position halting for dynamic compute allocation
- Depth LoRA — low-rank adapters per recurrent step for step-wise specialization
Training
Phase 1 — Pre-training
| Data | ~26M French texts |
| Sources | CATIE-AQ/wikipedia_fr_2022, HuggingFaceFW/fineweb-2 (fra_Latn) |
| Optimizer | AdamW (β₁=0.9, β₂=0.95, weight decay 0.1) |
| Scheduler | Linear warmup (2000 steps) → cosine decay |
| Precision | Mixed precision (AMP + GradScaler) |
| Effective batch size | 128 (32 × 4 gradient accumulation) |
| Progressive training | Block size 256 → 512 at step 2000 |
| Compilation | torch.compile |
| Label smoothing | 0.02 |
Phase 2 — Supervised Fine-Tuning (SFT)
The pre-trained model was fine-tuned on 6 curated French conversational datasets across 15 epochs, with prompt masking so that only assistant tokens contribute to the loss.
| Epochs | 15 |
| Learning rate | 2e-5 |
| Effective batch size | 128 (32 × 4 gradient accumulation) |
| Max grad norm | 1.0 |
| Chat format | ChatML |
| Loss | Cross-entropy on assistant tokens only |
SFT Datasets
| Dataset | Format | Details |
|---|---|---|
| Houzeric/everyday-conversations | Flat (user/assistant) | Custom-built |
| CATIE-AQ/facebook-community-alignment-dataset_french_conversation | Multi-turn conversations | Community alignment |
| angeluriot/french_instruct | Multi-turn conversations | 40k samples, filtered |
| jpacifico/French-Alpaca-dataset-Instruct-55K | Alpaca format | 20k samples |
| Houzeric/french-prompts-and-questions | Flat (prompt/answer) | 15k samples, filtered |
| Houzeric/physics-FR-qa-dataset | Flat (question/answer) | 5k samples, scientific QA |
What to Expect
Klovis is a technical demonstration — showing that a single engineer can design, train, and deploy a modern Transformer on a single GPU for under €50.
With 144M parameters, the model is capable of:
- Generating grammatically correct French text
- Following the ChatML conversational format
- Producing coherent responses on simple topics
Known limitations:
- Factual responses are frequently incorrect or fabricated (hallucinations)
- Logical reasoning is limited
- Responses can be repetitive or drift off-topic
- Context limited to 256 tokens
- French only
This model is a demonstration of what a single developer can achieve with a modern architecture at small scale. It is not intended to replace larger models for production use.
Technical Details
Implementation Highlights
- Custom implementation: every component (attention, RoPE, RMSNorm, SwiGLU, GQA, training loop, generation) is implemented in PyTorch — no external model code
- Hugging Face compatible: inherits from
PreTrainedModelandGenerationMixin, works withAutoModelForCausalLM - KV-cache inference: supports incremental decoding for efficient generation
- Multiple weight-sharing modes: standard, shared FFN, full sharing, and Recurrent-Depth
- Streaming chat: interactive CLI with real-time token-by-token output
- Monitoring: integrated with Trackio for live training dashboards
Source Code
The full source code is available at: github.com/eric-houzelle/mini-gpt
License
This model is released under the Apache 2.0 License.
Designed and trained by Eric Houzelle.
- Downloads last month
- 416