Text Generation
Transformers
Safetensors
English
llama
micro
nano
small
supra
SupraLabs
gtx
rtx
nvidia
lh-tech
axionlab
text-generation-inference
Instructions to use SupraLabs/MicroSupra-1k with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use SupraLabs/MicroSupra-1k with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="SupraLabs/MicroSupra-1k")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("SupraLabs/MicroSupra-1k") model = AutoModelForCausalLM.from_pretrained("SupraLabs/MicroSupra-1k") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use SupraLabs/MicroSupra-1k with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "SupraLabs/MicroSupra-1k" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SupraLabs/MicroSupra-1k", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/SupraLabs/MicroSupra-1k
- SGLang
How to use SupraLabs/MicroSupra-1k with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "SupraLabs/MicroSupra-1k" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SupraLabs/MicroSupra-1k", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "SupraLabs/MicroSupra-1k" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SupraLabs/MicroSupra-1k", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use SupraLabs/MicroSupra-1k with Docker Model Runner:
docker model run hf.co/SupraLabs/MicroSupra-1k
File size: 3,630 Bytes
780fbeb 00bd18b ad9b697 49c5dcd 780fbeb af1e16e db4808a 00bd18b 49c5dcd ad9b697 49c5dcd 333ef2a 49c5dcd 333ef2a 49c5dcd 333ef2a 49c5dcd 333ef2a 49c5dcd 333ef2a 49c5dcd 333ef2a cf95d44 333ef2a cf95d44 0656e5c cf95d44 333ef2a cf95d44 333ef2a ad9b697 49c5dcd ad9b697 49c5dcd 333ef2a 49c5dcd 333ef2a 49c5dcd 333ef2a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 | ---
license: mit
datasets:
- HuggingFaceFW/fineweb-edu
language:
- en
pipeline_tag: text-generation
tags:
- micro
- nano
- small
- supra
- SupraLabs
- gtx
- rtx
- nvidia
- llama
- lh-tech
- axionlab
library_name: transformers
---
## **🤖 MicroSupra-1k**
So... have you ever seen a model that runs on a 3 dollars hardware? No? If no, Now you're seeing!
MicroSupra-1k is a bacteria base model(lol) trained on 300 million tokens of Fineweb-Edu for 3 epochs as the **first version** of our MicroSupra series.
## Model Config
- Parameters: 1046 (0.001M)
- Architecture: LLaMa
- Vocab size: 1024
- Hidden Size: 1
- Intermediate Size: 2
- Hidden Layers: 1
- Attention Heads: 1
- Max Position Embeddings: 256
- Learning rate: <code>5e-3<code>
## Final Loss
This model reached a final train loss after 3 epochs of **6.046**.
## Examples
**Prompt:** "My name is "<br>
**Output:**: *"My name is ed and. as the, to. the, in
ingt thee the ofingi in
the., anda.-eo
ofles, b the,er,s fing.ssp the the
, of of, the,al, d to the m, the, to toed,
seng,,.y. in the,., in and them the thened.sing to
the of of andan the the,, the
to..,,sing,,.aring the the. of.al.,s ofcal ar s..e and.sssor of, and and."*
<br><br>
**Prompt:** "The main concept of physics is "<br>
**Output:**: *"The main concept of physics is a,
s and the. thet to, theing.... the,a then,c,i to, thee in b. toed.,,e theyalp the in,er thees- s,el,,,,
and, the of ine,,s the of cs of thesss the. f. to. thesining andor dar,,al the,. of p.
the.s the.,,s. anded,e. of, ofed, l toinging and themsr the of of. to
to thes thes aen,., ofes of a."*
<br><br>
**Prompt:** "Question: What is the capital of France?\nAnswer: "<br>
**Output:**: *"Question: What is the capital of France?
Answer:,. and to the. toc. ofs the m,a thee.. the, f ofling. as.,,y bt, the p
, in, the,,ees toed ing to.
o,
thes. the..,s the.ed and andang,,ed the of,,ms. of, thei the, the,ey,,s l.ing toe the the,se the to, the, the,aror, the of-. in the. the. the,e the of ds to,ic the the aal at the..
ingssy s and and"*
## Usage 🚀
```python3
print("[*] Loading libraries...")
import torch
from transformers import LlamaForCausalLM, PreTrainedTokenizerFast
model_path = "SupraLabs/MicroSupra-1k"
print("[*] Loading tokenizer...")
tokenizer = PreTrainedTokenizerFast.from_pretrained(model_path)
print("[*] Loading model...")
model = LlamaForCausalLM.from_pretrained(model_path)
model.eval()
prompt = "Question: What is the capital of France?\nAnswer:"
print(f"[*] Prompt: {prompt!r}")
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(
input_ids=inputs["input_ids"],
attention_mask=inputs["attention_mask"],
max_new_tokens=150,
do_sample=True,
temperature=0.35,
top_p=0.85,
repetition_penalty=1.2,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id,
)
print("[*] Output:", tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## Why did SupraLabs create this???
Because we are experimenting sizes, experiments(like 1Bit quant, distillation(NEW THINGS ARE COMING WITH DISTILLATION! GET TUNED!), pruning) all to better your experience! We are working on big things!
## Training guide
We trained MicroSupra on a GTX750 Ti 4GB in 1 Minute for 3 epochs.<br>
The model was trained on the first 300 million tokens of Sample-10BT from Fineweb-Edu using streaming tokenization.
## Final thoughts
Even without any intelligence, it shows that scaling laws are real. This ant model doesn't know how to talk, but we all know it emotions 🤖🫶 |