How to use from
Docker Model Runner
docker model run hf.co/klusai/tf3-26m-student
Quick Links

TF3 Student: Distilled Romanian Language Model

A compact 22.9M-parameter Romanian language model distilled from the TF3-50M teacher using logit-based knowledge distillation. Part of the TinyFabulist research project.

Model Details

Property Value
Parameters 22.9M (26.45M with untied embeddings)
Architecture LLaMA-style decoder-only Transformer
Hidden size 384
Attention heads 6 (head dim 64)
Layers 6
MLP intermediate 1,024
Vocab size 32,000 (Unigram, Romanian-specific)
Context length 2,048 tokens
Tied embeddings Yes
Training Knowledge distillation from klusai/tf3-50m-base

Training

  • Method: Logit-based knowledge distillation (KL + CE loss, alpha=0.009)
  • Teacher: klusai/tf3-50m-base (51.65M params, frozen)
  • Data: klusai/ds-tf2-en-ro-15k (15k Romanian fables)
  • Temperature: T=1.0
  • Epochs: 3
  • Learning rate: 3e-4 (cosine schedule, 50-step warmup)
  • Hardware: Apple M3 Ultra (96GB unified memory)

Intended Use

This model is a research artifact demonstrating knowledge distillation for compact Romanian language models trained on synthetic moral microfiction. It is designed for:

  • Research on compact language model compression
  • Romanian text generation in the fable/moral story domain
  • Downstream fine-tuning for Romanian NLP tasks

Not intended for: Production text generation, factual question answering, or safety-critical applications.

Limitations

  • Domain-restricted to moral microfiction (fables)
  • Trained exclusively on synthetic data
  • May exhibit repetitive patterns and simplified phrasing compared to the teacher
  • Gender agreement errors may occur in generated text

Citation

@article{nadas2026tf3,
  title={TF3-RO-50M: Training Compact Romanian Language Models from Scratch on Synthetic Moral Microfiction},
  author={Nada\c{s}, Mihai Dan and Dio\c{s}an, Laura and Tomescu, Andreea and Pi\c{s}coran, Andrei},
  journal={arXiv preprint arXiv:2601.10410},
  year={2026}
}

Related Models and Datasets

Artifact Description
klusai/tf3-50m-base Teacher model (51.65M)
klusai/tf3-50m-sft SFT-tuned teacher
klusai/tf3-bert NER model for entity coherence evaluation
klusai/ds-tf2-en-ro-3m 3M bilingual fable corpus
klusai/ds-tf2-en-ro-15k 15k curated subset for distillation/SFT
Downloads last month
543
Safetensors
Model size
22.9M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for klusai/tf3-26m-student

Finetuned
(2)
this model

Dataset used to train klusai/tf3-26m-student

Paper for klusai/tf3-26m-student