Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -18,18 +18,18 @@ datasets:
|
|
| 18 |
|
| 19 |
# TF3 Student: Distilled Romanian Language Model
|
| 20 |
|
| 21 |
-
A compact **
|
| 22 |
|
| 23 |
## Model Details
|
| 24 |
|
| 25 |
| Property | Value |
|
| 26 |
|----------|-------|
|
| 27 |
-
| Parameters |
|
| 28 |
| Architecture | LLaMA-style decoder-only Transformer |
|
| 29 |
-
| Hidden size |
|
| 30 |
-
| Attention heads |
|
| 31 |
| Layers | 6 |
|
| 32 |
-
| MLP intermediate | 1,
|
| 33 |
| Vocab size | 32,000 (Unigram, Romanian-specific) |
|
| 34 |
| Context length | 2,048 tokens |
|
| 35 |
| Tied embeddings | Yes |
|
|
|
|
| 18 |
|
| 19 |
# TF3 Student: Distilled Romanian Language Model
|
| 20 |
|
| 21 |
+
A compact **22.9M-parameter** Romanian language model distilled from the [TF3-50M teacher](https://huggingface.co/klusai/tf3-50m-base) using logit-based knowledge distillation. Part of the [TinyFabulist](https://arxiv.org/abs/2601.10410) research project.
|
| 22 |
|
| 23 |
## Model Details
|
| 24 |
|
| 25 |
| Property | Value |
|
| 26 |
|----------|-------|
|
| 27 |
+
| Parameters | 22.9M (26.45M with untied embeddings) |
|
| 28 |
| Architecture | LLaMA-style decoder-only Transformer |
|
| 29 |
+
| Hidden size | 384 |
|
| 30 |
+
| Attention heads | 6 (head dim 64) |
|
| 31 |
| Layers | 6 |
|
| 32 |
+
| MLP intermediate | 1,024 |
|
| 33 |
| Vocab size | 32,000 (Unigram, Romanian-specific) |
|
| 34 |
| Context length | 2,048 tokens |
|
| 35 |
| Tied embeddings | Yes |
|