Text Generation
Transformers
Safetensors
English
llama
small
cpu
supra
v5
tiny
mini
open
open-source
text-generation-inference
Instructions to use SupraLabs/Supra-Mini-v5-8M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use SupraLabs/Supra-Mini-v5-8M with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="SupraLabs/Supra-Mini-v5-8M")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("SupraLabs/Supra-Mini-v5-8M") model = AutoModelForCausalLM.from_pretrained("SupraLabs/Supra-Mini-v5-8M") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use SupraLabs/Supra-Mini-v5-8M with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "SupraLabs/Supra-Mini-v5-8M" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SupraLabs/Supra-Mini-v5-8M", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/SupraLabs/Supra-Mini-v5-8M
- SGLang
How to use SupraLabs/Supra-Mini-v5-8M with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "SupraLabs/Supra-Mini-v5-8M" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SupraLabs/Supra-Mini-v5-8M", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "SupraLabs/Supra-Mini-v5-8M" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SupraLabs/Supra-Mini-v5-8M", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use SupraLabs/Supra-Mini-v5-8M with Docker Model Runner:
docker model run hf.co/SupraLabs/Supra-Mini-v5-8M
| [*] Loading libraries... | |
| [*] Loading tokenizer... | |
| [*] Preparing 5,000,000,000 tokens (streaming, memmap-backed)... | |
| [=] Reusing existing token file: ./tokens.bin | |
| [+] Dataset ready: 4,882,812 chunks of 1024 tokens | |
| [*] Setting up model... | |
| [*] Model parameters: 7,867,584 | |
| [*] Defining training arguments... | |
| [transformers] warmup_ratio is deprecated and will be removed in v5.2. Use `warmup_steps` instead. | |
| [*] Starting training... | |
| 0%| | 0/9538 [00:00<?, ?it/s]W0515 19:05:07.214000 41242 torch/_inductor/utils.py:1731] [0/0] Not enough SMs to use max_autotune_gemm mode | |
| {'loss': '9.401', 'grad_norm': '1.098', 'learning_rate': '4.151e-05', 'epoch': '0.02097'} | |
| {'loss': '8.45', 'grad_norm': '1.012', 'learning_rate': '8.344e-05', 'epoch': '0.04194'} | |
| {'loss': '7.463', 'grad_norm': '1.007', 'learning_rate': '0.0001254', 'epoch': '0.06291'} | |
| {'loss': '6.763', 'grad_norm': '1.135', 'learning_rate': '0.0001673', 'epoch': '0.08389'} | |
| {'loss': '6.296', 'grad_norm': '0.8444', 'learning_rate': '0.0002', 'epoch': '0.1049'} | |
| Writing model shards: 100%|βββββββββββββββββββββββ| 1/1 [00:00<00:00, 45.64it/s] | |
| {'loss': '5.963', 'grad_norm': '1.129', 'learning_rate': '0.0001999', 'epoch': '0.1258'} | |
| {'loss': '5.732', 'grad_norm': '1.432', 'learning_rate': '0.0001997', 'epoch': '0.1468'} | |
| {'loss': '5.555', 'grad_norm': '1.714', 'learning_rate': '0.0001994', 'epoch': '0.1678'} | |
| {'loss': '5.407', 'grad_norm': '1.082', 'learning_rate': '0.0001989', 'epoch': '0.1887'} | |
| {'loss': '5.281', 'grad_norm': '1.087', 'learning_rate': '0.0001984', 'epoch': '0.2097'} | |
| Writing model shards: 100%|βββββββββββββββββββββββ| 1/1 [00:00<00:00, 51.79it/s] | |
| {'loss': '5.176', 'grad_norm': '1.031', 'learning_rate': '0.0001977', 'epoch': '0.2307'} | |
| {'loss': '5.081', 'grad_norm': '1.037', 'learning_rate': '0.0001969', 'epoch': '0.2517'} | |
| {'loss': '4.997', 'grad_norm': '1.259', 'learning_rate': '0.000196', 'epoch': '0.2726'} | |
| {'loss': '4.919', 'grad_norm': '1.149', 'learning_rate': '0.0001949', 'epoch': '0.2936'} | |
| {'loss': '4.848', 'grad_norm': '1.25', 'learning_rate': '0.0001938', 'epoch': '0.3146'} | |
| Writing model shards: 100%|βββββββββββββββββββββββ| 1/1 [00:00<00:00, 53.17it/s] | |
| {'loss': '4.782', 'grad_norm': '1.465', 'learning_rate': '0.0001925', 'epoch': '0.3355'} | |
| {'loss': '4.717', 'grad_norm': '1.792', 'learning_rate': '0.0001912', 'epoch': '0.3565'} | |
| {'loss': '4.656', 'grad_norm': '1.379', 'learning_rate': '0.0001897', 'epoch': '0.3775'} | |
| {'loss': '4.598', 'grad_norm': '1.669', 'learning_rate': '0.0001881', 'epoch': '0.3985'} | |
| {'loss': '4.55', 'grad_norm': '1.305', 'learning_rate': '0.0001864', 'epoch': '0.4194'} | |
| Writing model shards: 100%|βββββββββββββββββββββββ| 1/1 [00:00<00:00, 53.48it/s] | |
| {'loss': '4.508', 'grad_norm': '1.443', 'learning_rate': '0.0001846', 'epoch': '0.4404'} | |
| {'loss': '4.47', 'grad_norm': '1.677', 'learning_rate': '0.0001827', 'epoch': '0.4614'} | |
| {'loss': '4.437', 'grad_norm': '1.25', 'learning_rate': '0.0001807', 'epoch': '0.4823'} | |
| {'loss': '4.403', 'grad_norm': '1.595', 'learning_rate': '0.0001786', 'epoch': '0.5033'} | |
| {'loss': '4.375', 'grad_norm': '1.593', 'learning_rate': '0.0001764', 'epoch': '0.5243'} | |
| Writing model shards: 100%|βββββββββββββββββββββββ| 1/1 [00:00<00:00, 53.04it/s] | |
| {'loss': '4.35', 'grad_norm': '1.411', 'learning_rate': '0.0001741', 'epoch': '0.5453'} | |
| {'loss': '4.328', 'grad_norm': '2.014', 'learning_rate': '0.0001718', 'epoch': '0.5662'} | |
| {'loss': '4.303', 'grad_norm': '1.523', 'learning_rate': '0.0001693', 'epoch': '0.5872'} | |
| {'loss': '4.285', 'grad_norm': '1.343', 'learning_rate': '0.0001668', 'epoch': '0.6082'} | |
| {'loss': '4.264', 'grad_norm': '1.376', 'learning_rate': '0.0001641', 'epoch': '0.6291'} | |
| Writing model shards: 100%|βββββββββββββββββββββββ| 1/1 [00:00<00:00, 53.00it/s] | |
| {'loss': '4.247', 'grad_norm': '1.62', 'learning_rate': '0.0001614', 'epoch': '0.6501'} | |
| {'loss': '4.232', 'grad_norm': '1.243', 'learning_rate': '0.0001587', 'epoch': '0.6711'} | |
| {'loss': '4.214', 'grad_norm': '1.226', 'learning_rate': '0.0001558', 'epoch': '0.6921'} | |
| {'loss': '4.199', 'grad_norm': '1.243', 'learning_rate': '0.0001529', 'epoch': '0.713'} | |
| {'loss': '4.186', 'grad_norm': '1.813', 'learning_rate': '0.0001499', 'epoch': '0.734'} | |
| Writing model shards: 100%|βββββββββββββββββββββββ| 1/1 [00:00<00:00, 52.44it/s] | |
| {'loss': '4.171', 'grad_norm': '1.488', 'learning_rate': '0.0001469', 'epoch': '0.755'} | |
| {'loss': '4.158', 'grad_norm': '1.535', 'learning_rate': '0.0001438', 'epoch': '0.7759'} | |
| {'loss': '4.148', 'grad_norm': '1.208', 'learning_rate': '0.0001407', 'epoch': '0.7969'} | |
| {'loss': '4.136', 'grad_norm': '1.366', 'learning_rate': '0.0001375', 'epoch': '0.8179'} | |
| {'loss': '4.125', 'grad_norm': '1.289', 'learning_rate': '0.0001343', 'epoch': '0.8389'} | |
| Writing model shards: 100%|βββββββββββββββββββββββ| 1/1 [00:00<00:00, 52.97it/s] | |
| {'loss': '4.115', 'grad_norm': '1.306', 'learning_rate': '0.000131', 'epoch': '0.8598'} | |
| {'loss': '4.104', 'grad_norm': '1.121', 'learning_rate': '0.0001277', 'epoch': '0.8808'} | |
| {'loss': '4.096', 'grad_norm': '1.608', 'learning_rate': '0.0001243', 'epoch': '0.9018'} | |
| {'loss': '4.089', 'grad_norm': '1.192', 'learning_rate': '0.0001209', 'epoch': '0.9227'} | |
| {'loss': '4.078', 'grad_norm': '1.291', 'learning_rate': '0.0001175', 'epoch': '0.9437'} | |
| Writing model shards: 100%|βββββββββββββββββββββββ| 1/1 [00:00<00:00, 53.15it/s] | |
| {'loss': '4.073', 'grad_norm': '1.054', 'learning_rate': '0.0001141', 'epoch': '0.9647'} | |
| {'loss': '4.066', 'grad_norm': '1.141', 'learning_rate': '0.0001107', 'epoch': '0.9857'} | |
| {'loss': '4.057', 'grad_norm': '1.703', 'learning_rate': '0.0001072', 'epoch': '1.007'} | |
| {'loss': '4.051', 'grad_norm': '1.104', 'learning_rate': '0.0001038', 'epoch': '1.027'} | |
| {'loss': '4.042', 'grad_norm': '1.058', 'learning_rate': '0.0001003', 'epoch': '1.048'} | |
| Writing model shards: 100%|βββββββββββββββββββββββ| 1/1 [00:00<00:00, 22.14it/s] | |
| {'loss': '4.038', 'grad_norm': '1.095', 'learning_rate': '9.683e-05', 'epoch': '1.069'} | |
| {'loss': '4.032', 'grad_norm': '1.074', 'learning_rate': '9.337e-05', 'epoch': '1.09'} | |
| {'loss': '4.027', 'grad_norm': '1.18', 'learning_rate': '8.991e-05', 'epoch': '1.111'} | |
| {'loss': '4.02', 'grad_norm': '1.193', 'learning_rate': '8.647e-05', 'epoch': '1.132'} | |
| {'loss': '4.015', 'grad_norm': '1.291', 'learning_rate': '8.304e-05', 'epoch': '1.153'} | |
| Writing model shards: 100%|βββββββββββββββββββββββ| 1/1 [00:00<00:00, 59.08it/s] | |
| {'loss': '4.012', 'grad_norm': '1.045', 'learning_rate': '7.964e-05', 'epoch': '1.174'} | |
| {'loss': '4.008', 'grad_norm': '1.35', 'learning_rate': '7.625e-05', 'epoch': '1.195'} | |
| {'loss': '4.002', 'grad_norm': '1.086', 'learning_rate': '7.29e-05', 'epoch': '1.216'} | |
| {'loss': '3.999', 'grad_norm': '0.8626', 'learning_rate': '6.958e-05', 'epoch': '1.237'} | |
| {'loss': '3.992', 'grad_norm': '1.381', 'learning_rate': '6.63e-05', 'epoch': '1.258'} | |
| Writing model shards: 100%|βββββββββββββββββββββββ| 1/1 [00:00<00:00, 59.27it/s] | |
| {'loss': '3.99', 'grad_norm': '1.229', 'learning_rate': '6.305e-05', 'epoch': '1.279'} | |
| {'loss': '3.987', 'grad_norm': '0.8244', 'learning_rate': '5.985e-05', 'epoch': '1.3'} | |
| {'loss': '3.982', 'grad_norm': '0.9264', 'learning_rate': '5.67e-05', 'epoch': '1.321'} | |
| {'loss': '3.982', 'grad_norm': '1.037', 'learning_rate': '5.36e-05', 'epoch': '1.342'} | |
| {'loss': '3.976', 'grad_norm': '0.9665', 'learning_rate': '5.056e-05', 'epoch': '1.363'} | |
| Writing model shards: 100%|βββββββββββββββββββββββ| 1/1 [00:00<00:00, 60.27it/s] | |
| {'loss': '3.975', 'grad_norm': '0.8869', 'learning_rate': '4.758e-05', 'epoch': '1.384'} | |
| {'loss': '3.971', 'grad_norm': '0.7576', 'learning_rate': '4.466e-05', 'epoch': '1.405'} | |
| {'loss': '3.968', 'grad_norm': '0.8313', 'learning_rate': '4.18e-05', 'epoch': '1.426'} | |
| {'loss': '3.965', 'grad_norm': '0.7926', 'learning_rate': '3.902e-05', 'epoch': '1.447'} | |
| {'loss': '3.963', 'grad_norm': '0.9134', 'learning_rate': '3.631e-05', 'epoch': '1.468'} | |
| Writing model shards: 100%|βββββββββββββββββββββββ| 1/1 [00:00<00:00, 58.90it/s] | |
| {'loss': '3.963', 'grad_norm': '0.7194', 'learning_rate': '3.367e-05', 'epoch': '1.489'} | |
| {'loss': '3.96', 'grad_norm': '0.6361', 'learning_rate': '3.112e-05', 'epoch': '1.51'} | |
| {'loss': '3.957', 'grad_norm': '0.927', 'learning_rate': '2.865e-05', 'epoch': '1.531'} | |
| {'loss': '3.957', 'grad_norm': '0.6016', 'learning_rate': '2.626e-05', 'epoch': '1.552'} | |
| {'loss': '3.954', 'grad_norm': '0.6197', 'learning_rate': '2.397e-05', 'epoch': '1.573'} | |
| Writing model shards: 100%|βββββββββββββββββββββββ| 1/1 [00:00<00:00, 59.64it/s] | |
| {'loss': '3.952', 'grad_norm': '0.577', 'learning_rate': '2.176e-05', 'epoch': '1.594'} | |
| {'loss': '3.948', 'grad_norm': '0.5791', 'learning_rate': '1.965e-05', 'epoch': '1.615'} | |
| {'loss': '3.951', 'grad_norm': '0.5636', 'learning_rate': '1.763e-05', 'epoch': '1.636'} | |
| {'loss': '3.949', 'grad_norm': '0.5653', 'learning_rate': '1.572e-05', 'epoch': '1.657'} | |
| {'loss': '3.948', 'grad_norm': '0.5782', 'learning_rate': '1.39e-05', 'epoch': '1.678'} | |
| Writing model shards: 100%|βββββββββββββββββββββββ| 1/1 [00:00<00:00, 59.44it/s] | |
| {'loss': '3.946', 'grad_norm': '0.4793', 'learning_rate': '1.219e-05', 'epoch': '1.699'} | |
| {'loss': '3.946', 'grad_norm': '0.4931', 'learning_rate': '1.058e-05', 'epoch': '1.72'} | |
| {'loss': '3.945', 'grad_norm': '0.5097', 'learning_rate': '9.086e-06', 'epoch': '1.741'} | |
| {'loss': '3.945', 'grad_norm': '0.5356', 'learning_rate': '7.697e-06', 'epoch': '1.761'} | |
| {'loss': '3.945', 'grad_norm': '0.5223', 'learning_rate': '6.419e-06', 'epoch': '1.782'} | |
| Writing model shards: 100%|βββββββββββββββββββββββ| 1/1 [00:00<00:00, 59.33it/s] | |
| {'loss': '3.944', 'grad_norm': '0.4148', 'learning_rate': '5.253e-06', 'epoch': '1.803'} | |
| {'loss': '3.943', 'grad_norm': '0.4304', 'learning_rate': '4.201e-06', 'epoch': '1.824'} | |
| {'loss': '3.943', 'grad_norm': '0.4192', 'learning_rate': '3.265e-06', 'epoch': '1.845'} | |
| {'loss': '3.942', 'grad_norm': '0.4074', 'learning_rate': '2.444e-06', 'epoch': '1.866'} | |
| {'loss': '3.941', 'grad_norm': '0.44', 'learning_rate': '1.741e-06', 'epoch': '1.887'} | |
| Writing model shards: 100%|βββββββββββββββββββββββ| 1/1 [00:00<00:00, 20.57it/s] | |
| {'loss': '3.942', 'grad_norm': '0.4061', 'learning_rate': '1.156e-06', 'epoch': '1.908'} | |
| {'loss': '3.941', 'grad_norm': '0.3939', 'learning_rate': '6.899e-07', 'epoch': '1.929'} | |
| {'loss': '3.942', 'grad_norm': '0.3792', 'learning_rate': '3.431e-07', 'epoch': '1.95'} | |
| {'loss': '3.944', 'grad_norm': '0.3599', 'learning_rate': '1.161e-07', 'epoch': '1.971'} | |
| {'loss': '3.942', 'grad_norm': '0.3671', 'learning_rate': '9.142e-09', 'epoch': '1.992'} | |
| Writing model shards: 100%|βββββββββββββββββββββββ| 1/1 [00:00<00:00, 59.80it/s] | |
| Writing model shards: 100%|βββββββββββββββββββββββ| 1/1 [00:00<00:00, 59.16it/s] | |
| {'train_runtime': '3.949e+04', 'train_samples_per_second': '247.3', 'train_steps_per_second': '0.242', 'train_loss': '4.414', 'epoch': '2'} | |
| 100%|ββββββββββββββββββββββββββββββββββββ| 9538/9538 [10:58:05<00:00, 4.14s/it] | |
| Writing model shards: 100%|βββββββββββββββββββββββ| 1/1 [00:00<00:00, 57.38it/s] | |
| [*] Training finished. |