| --- |
| license: apache-2.0 |
| datasets: |
| - HuggingFaceFW/fineweb-edu |
| language: |
| - en |
| base_model: |
| - SupraLabs/Supra-Mini-v4-2M |
| pipeline_tag: text-generation |
| tags: |
| - supralabs |
| - supra |
| - cpu |
| - gpu |
| - distill |
| - base |
| - sub-1m |
| - axionlab |
| - lh-tech |
| --- |
| |
| ## DistillSupra-0.2M |
| -------------------------------------------------- |
|
|
| **DistillSupra-0.2M** is an ultra-compact causal language model with approximately **0.2 million parameters**, produced by knowledge distillation from [Supra-Mini-v4-2M](https://huggingface.co/SupraLabs/Supra-Mini-v4-2M). |
|
|
| It was trained 500 steps(1 Epoch) for 30 minutes on a GTX 750 Ti 4GB using generated text from the teacher. |
|
|
| The model was **10x** compressed! That's crazy! |
|
|
| ## Architecture |
|
|
| | Parameter | Teacher | Student | |
| |---------------------|---------|---------| |
| | hidden_size | 64 | 48 | |
| | intermediate_size | 128 | 96 | |
| | num_hidden_layers | 5 | 4 | |
| | num_attention_heads | 8 | 6 | |
| | vocab_size | 4096 | 4096 | |
| | Parameters | ~468k | ~289k | |
| |
| ## Some outputs: |
| |
| Prompt : Throughout history, great civilizations |
| -------------------------------------------------- |
| Output: Throughout history, great civilizations to in, a be polrain for is with more the the be the for. of be of on (I.er The b M.A-R and or have that not is and the is this they, can for to to. is of a a, to ofs the for and the a. in the is to as of is that an that of and you the which is, the, for in be a are by’ of. and to a m |
| |
| Prompt : The human brain is capable of |
| -------------------------------------------------- |
| Output: The human brain is capable ofs in an more that in a new can is the this the a of the pS, the a to the other in not it... and with a to that be are of to for in of of ass. The be of the,.F-s be the of dLal. ins of be and of Sin: and or that a one that to and a a bFed, asRal., the, is a and as |
| |
| Prompt : The most important principle in science is |
| -------------------------------------------------- |
| The most important principle in science is a is a this are not for that the to of be digels-LC. to the in a the to, on to, |
| |
| ## Why did supra created this trash? |
| |
| We are currently researching knowledge distillation and this was the first step! Things will better up! |
| |
| ## Final Thought |
| |
| Knowledge distillation is a promising thing for us, we believe that LLMs can be helpful even being so small! |
| |