Update README.md
Browse files
README.md
CHANGED
|
@@ -24,7 +24,20 @@ tags:
|
|
| 24 |
|
| 25 |
**DistillSupra-0.2M** is an ultra-compact causal language model with approximately **0.2 million parameters**, produced by knowledge distillation from [Supra-Mini-v4-2M](https://huggingface.co/SupraLabs/Supra-Mini-v4-2M).
|
| 26 |
|
| 27 |
-
It was trained 500 steps for 30 minutes on a GTX 750 Ti 4GB using generated text from the teacher.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
|
| 29 |
## Some outputs:
|
| 30 |
|
|
@@ -38,4 +51,12 @@ Output: The human brain is capable ofs in an more that in a new can is the this
|
|
| 38 |
|
| 39 |
Prompt : The most important principle in science is
|
| 40 |
--------------------------------------------------
|
| 41 |
-
The most important principle in science is a is a this are not for that the to of be digels-LC. to the in a the to, on to,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
|
| 25 |
**DistillSupra-0.2M** is an ultra-compact causal language model with approximately **0.2 million parameters**, produced by knowledge distillation from [Supra-Mini-v4-2M](https://huggingface.co/SupraLabs/Supra-Mini-v4-2M).
|
| 26 |
|
| 27 |
+
It was trained 500 steps(1 Epoch) for 30 minutes on a GTX 750 Ti 4GB using generated text from the teacher.
|
| 28 |
+
|
| 29 |
+
The model was **10x** compressed! That's crazy!
|
| 30 |
+
|
| 31 |
+
## Architecture
|
| 32 |
+
|
| 33 |
+
| Parameter | Teacher | Student |
|
| 34 |
+
|---------------------|---------|---------|
|
| 35 |
+
| hidden_size | 64 | 48 |
|
| 36 |
+
| intermediate_size | 128 | 96 |
|
| 37 |
+
| num_hidden_layers | 5 | 4 |
|
| 38 |
+
| num_attention_heads | 8 | 6 |
|
| 39 |
+
| vocab_size | 4096 | 4096 |
|
| 40 |
+
| Parameters | ~468k | ~289k |
|
| 41 |
|
| 42 |
## Some outputs:
|
| 43 |
|
|
|
|
| 51 |
|
| 52 |
Prompt : The most important principle in science is
|
| 53 |
--------------------------------------------------
|
| 54 |
+
The most important principle in science is a is a this are not for that the to of be digels-LC. to the in a the to, on to,
|
| 55 |
+
|
| 56 |
+
## Why did supra created this trash?
|
| 57 |
+
|
| 58 |
+
We are currently researching knowledge distillation and this was the first step! Things will better up!
|
| 59 |
+
|
| 60 |
+
## Final Thought
|
| 61 |
+
|
| 62 |
+
Knowledge distillation is a promising thing for us, we believe that LLMs can be helpful even being so small!
|