Text Generation
Transformers
Safetensors
English
llama
small
cpu
supra
v5
tiny
mini
open
open-source
text-generation-inference
Instructions to use SupraLabs/Supra-Mini-v5-8M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use SupraLabs/Supra-Mini-v5-8M with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="SupraLabs/Supra-Mini-v5-8M")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("SupraLabs/Supra-Mini-v5-8M") model = AutoModelForCausalLM.from_pretrained("SupraLabs/Supra-Mini-v5-8M") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use SupraLabs/Supra-Mini-v5-8M with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "SupraLabs/Supra-Mini-v5-8M" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SupraLabs/Supra-Mini-v5-8M", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/SupraLabs/Supra-Mini-v5-8M
- SGLang
How to use SupraLabs/Supra-Mini-v5-8M with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "SupraLabs/Supra-Mini-v5-8M" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SupraLabs/Supra-Mini-v5-8M", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "SupraLabs/Supra-Mini-v5-8M" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SupraLabs/Supra-Mini-v5-8M", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use SupraLabs/Supra-Mini-v5-8M with Docker Model Runner:
docker model run hf.co/SupraLabs/Supra-Mini-v5-8M
Update README.md
Browse files
README.md
CHANGED
|
@@ -18,7 +18,7 @@ tags:
|
|
| 18 |
---
|
| 19 |
|
| 20 |
# 🦅 Supra Mini v5 8M
|
| 21 |
-
Supra Mini **v5** 8M is a very
|
| 22 |
|
| 23 |
## Model Config
|
| 24 |
|
|
@@ -35,7 +35,7 @@ Supra Mini **v5** 8M is a very tiny base model trained on **5 billion** tokens o
|
|
| 35 |
- Trained in bfloat16
|
| 36 |
|
| 37 |
## Final Loss
|
| 38 |
-
This model reached a final
|
| 39 |
|
| 40 |
## Benchmarks
|
| 41 |
|
|
@@ -69,12 +69,13 @@ So, why does that sound? Because there's something new than a few different thin
|
|
| 69 |
\- "You have no idea where I'm so, but if my son has any thought or understanding of his name, he will be able to understand him by saying something more than one day. This way they can tell us when you've got me at home. That is why I want your child to know which words are most important for them: "If you get this language from another person, then you'll find yourself in the same place as you read it," says Mike McNamara, who was born with an English friend, Jennifer Batharinee, who had been diagnosed with dementia during her lifetime. He said he would learn how to say things such as "a lot of things," and "you don't really need to do anything else." It may seem simple because he didn't feel good before he went out and asked whether he could make sense of it. But he wanted to take advantage of the fact"*
|
| 70 |
|
| 71 |
## Usage
|
| 72 |
-
To use our model, just run this code
|
|
|
|
| 73 |
```python3
|
| 74 |
from transformers import pipeline
|
| 75 |
import torch
|
| 76 |
|
| 77 |
-
print("
|
| 78 |
pipe = pipeline(
|
| 79 |
"text-generation",
|
| 80 |
model="SupraLabs/Supra-Mini-v5-8M",
|
|
@@ -101,8 +102,19 @@ print(f"\nPrompt: {test_prompt}")
|
|
| 101 |
print("-" * 30)
|
| 102 |
print("\nOutput:\n" + generate_text(test_prompt))
|
| 103 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 104 |
|
| 105 |
## Training guide
|
| 106 |
We trained Supra Mini v5 8M on a single NVIDIA RTX 5060 Ti 16GB in ~11 hours for 2 epochs.<br>
|
| 107 |
The full training code can be found in this repo as `train_tokenizer.py` (train costum BPE tokenizer with vocab size of 16384), `train.py` (train the model) and `inference.py` (test the model).<br>
|
| 108 |
-
The model was trained on the first 5 billion tokens of Sample-10BT from Fineweb-Edu
|
|
|
|
| 18 |
---
|
| 19 |
|
| 20 |
# 🦅 Supra Mini v5 8M
|
| 21 |
+
Supra Mini **v5** 8M is a very small model trained on **5 billion** tokens of Fineweb-Edu for 2 epochs as the **fifth version** of our Supra Mini series. SupraMini-8M shows improvements across all benchmarks because of its larger size and training budget.
|
| 22 |
|
| 23 |
## Model Config
|
| 24 |
|
|
|
|
| 35 |
- Trained in bfloat16
|
| 36 |
|
| 37 |
## Final Loss
|
| 38 |
+
This model reached a final CrossEntropy loss (on the train set) of **4.414**.
|
| 39 |
|
| 40 |
## Benchmarks
|
| 41 |
|
|
|
|
| 69 |
\- "You have no idea where I'm so, but if my son has any thought or understanding of his name, he will be able to understand him by saying something more than one day. This way they can tell us when you've got me at home. That is why I want your child to know which words are most important for them: "If you get this language from another person, then you'll find yourself in the same place as you read it," says Mike McNamara, who was born with an English friend, Jennifer Batharinee, who had been diagnosed with dementia during her lifetime. He said he would learn how to say things such as "a lot of things," and "you don't really need to do anything else." It may seem simple because he didn't feel good before he went out and asked whether he could make sense of it. But he wanted to take advantage of the fact"*
|
| 70 |
|
| 71 |
## Usage
|
| 72 |
+
To use our model, just run this code:
|
| 73 |
+
|
| 74 |
```python3
|
| 75 |
from transformers import pipeline
|
| 76 |
import torch
|
| 77 |
|
| 78 |
+
print("Loading Supra Mini v5 8M model from Hugging Face...")
|
| 79 |
pipe = pipeline(
|
| 80 |
"text-generation",
|
| 81 |
model="SupraLabs/Supra-Mini-v5-8M",
|
|
|
|
| 102 |
print("-" * 30)
|
| 103 |
print("\nOutput:\n" + generate_text(test_prompt))
|
| 104 |
```
|
| 105 |
+
## Use cases
|
| 106 |
+
|
| 107 |
+
1. Educational research
|
| 108 |
+
2. deployment or testing/fine-tuning on edge environments
|
| 109 |
+
3. Or more simply, for fun
|
| 110 |
+
|
| 111 |
+
## Limitations
|
| 112 |
+
|
| 113 |
+
1. Cannot reason, chat, or code
|
| 114 |
+
2. Incoherent more often than not
|
| 115 |
+
3. Mostly unfactual
|
| 116 |
|
| 117 |
## Training guide
|
| 118 |
We trained Supra Mini v5 8M on a single NVIDIA RTX 5060 Ti 16GB in ~11 hours for 2 epochs.<br>
|
| 119 |
The full training code can be found in this repo as `train_tokenizer.py` (train costum BPE tokenizer with vocab size of 16384), `train.py` (train the model) and `inference.py` (test the model).<br>
|
| 120 |
+
The model was trained on the first 5 billion tokens of Sample-10BT from Fineweb-Edu.
|