klusai
/

tf3-26m-student

Text Generation

text-generation-inference

Model card Files Files and versions

mihainadas commited on 9 days ago

Commit

cf1c14c

·

verified ·

1 Parent(s): 4aa4948

Upload README.md with huggingface_hub

Files changed (1) hide show

README.md +5 -5

README.md CHANGED Viewed

@@ -18,18 +18,18 @@ datasets:
 # TF3 Student: Distilled Romanian Language Model
-A compact **35.4M-parameter** Romanian language model distilled from the [TF3-50M teacher](https://huggingface.co/klusai/tf3-50m-base) using logit-based knowledge distillation. Part of the [TinyFabulist](https://arxiv.org/abs/2601.10410) research project.
 ## Model Details
 | Property | Value |
 |----------|-------|
-| Parameters | 35.4M |
 | Architecture | LLaMA-style decoder-only Transformer |
-| Hidden size | 512 |
-| Attention heads | 8 (head dim 64) |
 | Layers | 6 |
-| MLP intermediate | 1,380 |
 | Vocab size | 32,000 (Unigram, Romanian-specific) |
 | Context length | 2,048 tokens |
 | Tied embeddings | Yes |

 # TF3 Student: Distilled Romanian Language Model
+A compact **22.9M-parameter** Romanian language model distilled from the [TF3-50M teacher](https://huggingface.co/klusai/tf3-50m-base) using logit-based knowledge distillation. Part of the [TinyFabulist](https://arxiv.org/abs/2601.10410) research project.
 ## Model Details
 | Property | Value |
 |----------|-------|
+| Parameters | 22.9M (26.45M with untied embeddings) |
 | Architecture | LLaMA-style decoder-only Transformer |
+| Hidden size | 384 |
+| Attention heads | 6 (head dim 64) |
 | Layers | 6 |
+| MLP intermediate | 1,024 |
 | Vocab size | 32,000 (Unigram, Romanian-specific) |
 | Context length | 2,048 tokens |
 | Tied embeddings | Yes |