DistillSupra-0.2M / README.md
AxionLab-official's picture
Update README.md
a801075 verified
---
license: apache-2.0
datasets:
- HuggingFaceFW/fineweb-edu
language:
- en
base_model:
- SupraLabs/Supra-Mini-v4-2M
pipeline_tag: text-generation
tags:
- supralabs
- supra
- cpu
- gpu
- distill
- base
- sub-1m
- axionlab
- lh-tech
---
## DistillSupra-0.2M
--------------------------------------------------
**DistillSupra-0.2M** is an ultra-compact causal language model with approximately **0.2 million parameters**, produced by knowledge distillation from [Supra-Mini-v4-2M](https://huggingface.co/SupraLabs/Supra-Mini-v4-2M).
It was trained 500 steps(1 Epoch) for 30 minutes on a GTX 750 Ti 4GB using generated text from the teacher.
The model was **10x** compressed! That's crazy!
## Architecture
| Parameter | Teacher | Student |
|---------------------|---------|---------|
| hidden_size | 64 | 48 |
| intermediate_size | 128 | 96 |
| num_hidden_layers | 5 | 4 |
| num_attention_heads | 8 | 6 |
| vocab_size | 4096 | 4096 |
| Parameters | ~468k | ~289k |
## Some outputs:
Prompt : Throughout history, great civilizations
--------------------------------------------------
Output: Throughout history, great civilizations to in, a be polrain for is with more the the be the for. of be of on (I.er The b M.A-R and or have that not is and the is this they, can for to to. is of a a, to ofs the for and the a. in the is to as of is that an that of and you the which is, the, for in be a are by’ of. and to a m
Prompt : The human brain is capable of
--------------------------------------------------
Output: The human brain is capable ofs in an more that in a new can is the this the a of the pS, the a to the other in not it... and with a to that be are of to for in of of ass. The be of the,.F-s be the of dLal. ins of be and of Sin: and or that a one that to and a a bFed, asRal., the, is a and as
Prompt : The most important principle in science is
--------------------------------------------------
The most important principle in science is a is a this are not for that the to of be digels-LC. to the in a the to, on to,
## Why did supra created this trash?
We are currently researching knowledge distillation and this was the first step! Things will better up!
## Final Thought
Knowledge distillation is a promising thing for us, we believe that LLMs can be helpful even being so small!