Instructions to use SupraLabs/Supra-Mini-v3-0.5M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use SupraLabs/Supra-Mini-v3-0.5M with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="SupraLabs/Supra-Mini-v3-0.5M")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("SupraLabs/Supra-Mini-v3-0.5M")
model = AutoModelForCausalLM.from_pretrained("SupraLabs/Supra-Mini-v3-0.5M")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use SupraLabs/Supra-Mini-v3-0.5M with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "SupraLabs/Supra-Mini-v3-0.5M"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SupraLabs/Supra-Mini-v3-0.5M",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/SupraLabs/Supra-Mini-v3-0.5M

SGLang

How to use SupraLabs/Supra-Mini-v3-0.5M with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "SupraLabs/Supra-Mini-v3-0.5M" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SupraLabs/Supra-Mini-v3-0.5M",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "SupraLabs/Supra-Mini-v3-0.5M" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SupraLabs/Supra-Mini-v3-0.5M",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use SupraLabs/Supra-Mini-v3-0.5M with Docker Model Runner:
```
docker model run hf.co/SupraLabs/Supra-Mini-v3-0.5M
```

Supra-Mini-v3-0.5M

File size: 4,383 Bytes

---
license: apache-2.0
datasets:
- HuggingFaceFW/fineweb-edu
language:
- en
pipeline_tag: text-generation
library_name: transformers
tags:
- small
- cpu
- supra
- v3
- tiny
- mini
- open
- open-source
new_version: SupraLabs/Supra-Mini-v4-2M
---

# 🦅 Supra Mini v3 0.5M
Supra Mini **v3** 0.5M is a very tiny base model trained on **1 billion** tokens of Fineweb-Edu for 2 epochs as the **third version** of our Supra Mini series.

## Model Config

- Parameters: 467,648 (0.5M)
- Architecture: Llama
- Vocab size with custom BPE tokenizer: 4096
- Hidden Size: 64
- Intermediate Size: 128
- Hidden Layers: 5
- Attention Heads: 8
- Max Position Embeddings: 512
- Learning rate: 5e-4
- Weight Decay: 0.01
- Trained in bfloat16

## Final Loss
This model reached a final train loss after 2 epochs of **4.872**.

## Benchmarks

All benchmarks were executed using `lm-eval`.

| Task          | Value        | Random level |
| :------------ | :----------: | -----------: |
| Arc_Easy      | 0.2727       | 0.25 (25%)   |
| Wikitext      | 4.4881       | -            |
| BLiMP         | 0.5526       | 0.5 (50%)    |

![Benchmarks Table](https://cdn-uploads.huggingface.co/production/uploads/697f2832c2c5e4daa93cece7/mV3IN7fL4boXkeLKcW2nl.png)

## Examples
**Prompt:** "Artificial intelligence is "<br>
**Output:**: *"Artificial intelligence is ignored by the Sunnette-Caiti.
Prequent and Burkin, A.S. Cougalia (Mean)
Researchers in the Northern Museum of Treaty (Figure 1). The first thing that has a drop or an ambitions to the timeline, which is used for more than a large amount of cosmic moisture (Bargeus), the vital bones are not a few weeks ago, but there was no longer in the same time where he had a hour. It’s also known as the “the person” because they were unhab"*
<br><br>
**Prompt:** "The main concept of physics is "<br>
**Output:**: *"The main concept of physics is escape, and that the other important to be unable to be used.
"It’s a new way for how it can be used in this article on what you are going to make sure why we know when they have to do with your own words like the CBDS?”
What is not as you think about ourselves by the BCO (RI)?
Now: I want to get up to the classroom, if you need to seek them! You will find us or ready to learn about the students to understand what the child has to create a lot of skills and writing. If you can ask the questions of these kids, you may use you to work out from myself"*
<br><br>
**Prompt:** "Once upon a time, "<br>
**Output:**: *"Once upon a time,  ignificantly was to be the first of his father.
The Helburg and he had been seen in the Morsey's songs of God’s mastery, but it is noted by the Surtaine who was the most important parties that he did not have herself with him or he could be an obvious way to do something.
"Intarias, I were only a good thing to love this, but we can see what you are going from the "consin” (handing for me) and then he will be used in our owner; there is no reason to get a timber, but it would be very hard"*

## Usage
To use our model, just run this code using HF Transformers to execute the model:
```python3
from transformers import pipeline
import torch

print("[*] Loading Supra Mini v3 0.5M model from Hugging Face Hub...")
pipe = pipeline(
    "text-generation", 
    model="SupraLabs/Supra-Mini-v3-0.5M",
    device_map="auto",
    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32
)

def generate_text(prompt, max_length=150):
    result = pipe(
        prompt, 
        max_new_tokens=max_length,
        do_sample=True,
        temperature=0.5,
        top_k=25,
        top_p=0.9,
        repetition_penalty=1.2,
        pad_token_id=pipe.tokenizer.pad_token_id,
        eos_token_id=pipe.tokenizer.eos_token_id
    )
    return result[0]['generated_text']

test_prompt = "The importance of education is"
print(f"\nPrompt: {test_prompt}")
print("-" * 30)
print("\nOutput:\n" + generate_text(test_prompt))
```

## Training guide
We trained Supra Mini v3 0.5M on a single NVIDIA RTX 5060 Ti 16GB in ~1 hour for 2 epochs.<br>
The full training code can be found in this repo as `train_tokenizer.py` (train costum BPE tokenizer with vocab size of 4096), `train.py` (train the model) and `inference.py` (test the model).<br>
The model was trained on the first 1 billion tokens of Sample-10BT from Fineweb-Edu using streaming tokenization.