Instructions to use SupraLabs/Supra-Mini-0.1M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use SupraLabs/Supra-Mini-0.1M with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="SupraLabs/Supra-Mini-0.1M")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("SupraLabs/Supra-Mini-0.1M")
model = AutoModelForCausalLM.from_pretrained("SupraLabs/Supra-Mini-0.1M")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use SupraLabs/Supra-Mini-0.1M with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "SupraLabs/Supra-Mini-0.1M"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SupraLabs/Supra-Mini-0.1M",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/SupraLabs/Supra-Mini-0.1M

SGLang

How to use SupraLabs/Supra-Mini-0.1M with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "SupraLabs/Supra-Mini-0.1M" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SupraLabs/Supra-Mini-0.1M",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "SupraLabs/Supra-Mini-0.1M" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SupraLabs/Supra-Mini-0.1M",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use SupraLabs/Supra-Mini-0.1M with Docker Model Runner:
```
docker model run hf.co/SupraLabs/Supra-Mini-0.1M
```

Supra-Mini-0.1M

File size: 4,386 Bytes

cd197e3
 
537b7d7
 
 
 
 
 
 
 
 
 
 
 
 
 
50b2df0
7ab6dc3
50b2df0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
71c55a8
cd197e3
537b7d7
 
08a6bea
537b7d7
fc97bbd
efe8b1a
 
 
 
 
 
 
 
 
 
 
 
fc97bbd
8c2a88a
fc97bbd
537b7d7
 
 
 
fc97bbd
 
 
 
 
537b7d7
 
fc97bbd
 
537b7d7
fc97bbd
 
537b7d7
fc97bbd
 
537b7d7
 
 
 
a6f51b1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5864993
537b7d7
 
08a6bea
 
 
 
 
 
 
 
 
 
 
 
537b7d7
 
ba1bd8f
537b7d7
 
ff7c5fb
 
 
537b7d7

---
license: apache-2.0
datasets:
- HuggingFaceFW/fineweb-edu
language:
- en
pipeline_tag: text-generation
library_name: transformers
tags:
- small
- cpu
- supra
- tiny
- mini
- open
- open-source
metrics: null
model-index:
- name: WikiText
  results:
  - task:
      type: text-generation
    dataset:
      name: wikipedia
      type: wikitext
    metrics:
    - name: WikiText
      type: WikiText
      value: 25.16(2500%)
    source:
      name: Self-eval
      url: https://huggingface.co/SupraLabs/Supra-Mini-0.1M
- name: Blimp
  results:
  - task:
      type: text-generation
    dataset:
      name: Blimp
      type: BLiMP
    metrics:
    - name: BLIMP
      type: BLIMP
      value: 0.5(50%)
    source:
      name: Self-eval
      url: https://huggingface.co/SupraLabs/Supra-Mini-0.1M
new_version: SupraLabs/Supra-Mini-v4-2M
---

# 🦅 Supra Mini 0.1M
Supra Mini 0.1M is a very small, yes, very small base model trained on 500 million tokens of Fineweb-Edu for 2 epochs to prove how well very tiny models can perform on world knowledge.

## Model Config

- Parameters: 117,648 (0.1M)
- Architecture: Llama
- Vocab size with custom BPE tokenizer: 250
- Hidden Size: 48
- Intermediate Size: 96
- Hidden Layers: 4
- Attention Heads: 4
- Max Position Embeddings: 256
- Learning rate: 6e-4
- Weight Decay: 0.01

## Final Loss
This model reached a final train loss after 2 epochs of **1.819**.

## Benchmarks

All benchmarks were executed using `lm-eval`.

| Task          | Value        | Random level |
| :------------ | :----------: | -----------: |
| Arc_Easy      | 0.2639       | 0.25 (25%)   |
| Wikitext      | 25.1691      | -            |
| BLiMP         | 0.5177       | 0.5 (50%)    |

## Examples
**Prompt:** "Artificial intelligence is "<br>
**Output:**: "Artificial intelligence is power by the leading the community, the book of the bring and in the made to the production of the back of an installing and consider in the several c"
<br><br>
**Prompt:** "The main concept of physics is "<br>
**Output:**: "The main concept of physics is a struggle of the development of the company of the solution of the work of the first can be some of the supply a part of the state of the management,"
<br><br>
**Prompt:** "Once upon a time, "<br>
**Output:**: "Once upon a time, so that he survey which is a self-described by the series of the surgery of the really a policy of the process of the southern of the material the stu"

## Usage
To use our model, just run this code using HF Transformers to execute the model:
```python3
from transformers import pipeline
import torch

print("[*] Loading model from Hugging Face Hub...")
pipe = pipeline(
    "text-generation", 
    model="SupraLabs/Supra-Mini-0.1M",
    device_map="auto",
    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32
)

def generate_text(prompt, max_length=50):
    result = pipe(
        prompt, 
        max_new_tokens=max_length,
        do_sample=True,
        temperature=0.5,
        top_k=25,
        top_p=0.9,
        repetition_penalty=1.2,
        pad_token_id=pipe.tokenizer.pad_token_id,
        eos_token_id=pipe.tokenizer.eos_token_id
    )
    return result[0]['generated_text']

test_prompt = "The importance of education is"
print(f"\nPrompt: {test_prompt}")
print("-" * 30)
print("\nOutput:\n" + generate_text(test_prompt))
```

## Use cases

1. Educational research
2. deployment or testing/fine-tuning on edge environments
3. Or more simply, for fun

## Limitations

1. Cannot reason, chat, or code
2. Incoherent more often than not
3. Mostly unfactual

## Training guide
We trained Supra Mini 0.1M on a single T4 GPU in ~45 minutes for 2 epochs.<br>
The full training code can be found in this repo as `run.sh` (easily run the complete pipeline), `train_tokenizer.py` (train costum BPE tokenizer with vocab size of 250), `train.py` (train the model) and `inference.py` (test the model).<br>
The model was trained on the first 500 million tokens of Sample-10BT from Fineweb-Edu using streaming tokenization.

## Overtraining
Yes, this model is heavily overtrained! With about ~212x more data than needed (20 tokens per parameter is chinchilla-optimum - we used ~4250).

## Final thoughts
As the new founded organization **SupraLabs**, we are proud the introduce our first Tiny-LLM to prove that our pipeline is running.<br>
More models will release soon...