--- license: apache-2.0 datasets: - HuggingFaceFW/fineweb-edu language: - en base_model: - SupraLabs/Supra-Mini-v4-2M pipeline_tag: text-generation tags: - supralabs - supra - cpu - gpu - distill - base - sub-1m - axionlab - lh-tech --- ## DistillSupra-0.2M -------------------------------------------------- **DistillSupra-0.2M** is an ultra-compact causal language model with approximately **0.2 million parameters**, produced by knowledge distillation from [Supra-Mini-v4-2M](https://huggingface.co/SupraLabs/Supra-Mini-v4-2M). It was trained 500 steps(1 Epoch) for 30 minutes on a GTX 750 Ti 4GB using generated text from the teacher. The model was **10x** compressed! That's crazy! ## Architecture | Parameter | Teacher | Student | |---------------------|---------|---------| | hidden_size | 64 | 48 | | intermediate_size | 128 | 96 | | num_hidden_layers | 5 | 4 | | num_attention_heads | 8 | 6 | | vocab_size | 4096 | 4096 | | Parameters | ~468k | ~289k | ## Some outputs: Prompt : Throughout history, great civilizations -------------------------------------------------- Output: Throughout history, great civilizations to in, a be polrain for is with more the the be the for. of be of on (I.er The b M.A-R and or have that not is and the is this they, can for to to. is of a a, to ofs the for and the a. in the is to as of is that an that of and you the which is, the, for in be a are by’ of. and to a m Prompt : The human brain is capable of -------------------------------------------------- Output: The human brain is capable ofs in an more that in a new can is the this the a of the pS, the a to the other in not it... and with a to that be are of to for in of of ass. The be of the,.F-s be the of dLal. ins of be and of Sin: and or that a one that to and a a bFed, asRal., the, is a and as Prompt : The most important principle in science is -------------------------------------------------- The most important principle in science is a is a this are not for that the to of be digels-LC. to the in a the to, on to, ## Why did supra created this trash? We are currently researching knowledge distillation and this was the first step! Things will better up! ## Final Thought Knowledge distillation is a promising thing for us, we believe that LLMs can be helpful even being so small!