---
license: apache-2.0
datasets:
- HuggingFaceFW/fineweb-edu
language:
- en
base_model:
- SupraLabs/Supra-Mini-v4-2M
pipeline_tag: text-generation
tags:
- supralabs
- supra
- cpu
- gpu
- distill
- base
- sub-1m
- axionlab
- lh-tech
---

## DistillSupra-0.2M
--------------------------------------------------

**DistillSupra-0.2M** is an ultra-compact causal language model with approximately **0.2 million parameters**, produced by knowledge distillation from [Supra-Mini-v4-2M](https://huggingface.co/SupraLabs/Supra-Mini-v4-2M).

It was trained 500 steps(1 Epoch) for 30 minutes on a GTX 750 Ti 4GB using generated text from the teacher.

The model was **10x** compressed! That's crazy!

## Architecture

| Parameter          | Teacher | Student |
|---------------------|---------|---------|
| hidden_size         | 64      | 48      |
| intermediate_size   | 128     | 96      |
| num_hidden_layers   | 5       | 4       |
| num_attention_heads | 8       | 6       |
| vocab_size          | 4096    | 4096    |
| Parameters         | ~468k   | ~289k   |

## Some outputs:

Prompt : Throughout history, great civilizations
--------------------------------------------------
Output: Throughout history, great civilizations to in, a be polrain for is with more the the be the for. of be of on (I.er The b M.A-R and or have that not is and the is this they, can for to to. is of a a, to ofs the for and the a. in the is to as of is that an that of and you the which is, the, for in be a are by’ of. and to a m

Prompt : The human brain is capable of
--------------------------------------------------
Output: The human brain is capable ofs in an more that in a new can is the this the a of the pS, the a to the other in not it... and with a to that be are of to for in of of ass. The be of the,.F-s be the of dLal. ins of be and of Sin: and or that a one that to and a a bFed, asRal., the, is a and as

Prompt : The most important principle in science is
--------------------------------------------------
The most important principle in science is a is a this are not for that the to of be digels-LC. to the in a the to, on to,

## Why did supra created this trash?

We are currently researching knowledge distillation and this was the first step! Things will better up!

## Final Thought

Knowledge distillation is a promising thing for us, we believe that LLMs can be helpful even being so small!