Text Generation
Transformers
Safetensors
English
llama
micro
nano
small
supra
SupraLabs
gtx
rtx
nvidia
lh-tech
axionlab
text-generation-inference
Instructions to use SupraLabs/MicroSupra-1k with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use SupraLabs/MicroSupra-1k with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="SupraLabs/MicroSupra-1k")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("SupraLabs/MicroSupra-1k") model = AutoModelForCausalLM.from_pretrained("SupraLabs/MicroSupra-1k") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use SupraLabs/MicroSupra-1k with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "SupraLabs/MicroSupra-1k" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SupraLabs/MicroSupra-1k", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/SupraLabs/MicroSupra-1k
- SGLang
How to use SupraLabs/MicroSupra-1k with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "SupraLabs/MicroSupra-1k" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SupraLabs/MicroSupra-1k", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "SupraLabs/MicroSupra-1k" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SupraLabs/MicroSupra-1k", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use SupraLabs/MicroSupra-1k with Docker Model Runner:
docker model run hf.co/SupraLabs/MicroSupra-1k
| license: mit | |
| datasets: | |
| - HuggingFaceFW/fineweb-edu | |
| language: | |
| - en | |
| pipeline_tag: text-generation | |
| tags: | |
| - micro | |
| - nano | |
| - small | |
| - supra | |
| - SupraLabs | |
| - gtx | |
| - rtx | |
| - nvidia | |
| - llama | |
| - lh-tech | |
| - axionlab | |
| library_name: transformers | |
| ## **🤖 MicroSupra-1k** | |
| So... have you ever seen a model that runs on a 3 dollars hardware? No? If no, Now you're seeing! | |
| MicroSupra-1k is a bacteria base model(lol) trained on 300 million tokens of Fineweb-Edu for 3 epochs as the **first version** of our MicroSupra series. | |
| ## Model Config | |
| - Parameters: 1046 (0.001M) | |
| - Architecture: LLaMa | |
| - Vocab size: 1024 | |
| - Hidden Size: 1 | |
| - Intermediate Size: 2 | |
| - Hidden Layers: 1 | |
| - Attention Heads: 1 | |
| - Max Position Embeddings: 256 | |
| - Learning rate: <code>5e-3<code> | |
| ## Final Loss | |
| This model reached a final train loss after 3 epochs of **6.046**. | |
| ## Examples | |
| **Prompt:** "My name is "<br> | |
| **Output:**: *"My name is ed and. as the, to. the, in | |
| ingt thee the ofingi in | |
| the., anda.-eo | |
| ofles, b the,er,s fing.ssp the the | |
| , of of, the,al, d to the m, the, to toed, | |
| seng,,.y. in the,., in and them the thened.sing to | |
| the of of andan the the,, the | |
| to..,,sing,,.aring the the. of.al.,s ofcal ar s..e and.sssor of, and and."* | |
| <br><br> | |
| **Prompt:** "The main concept of physics is "<br> | |
| **Output:**: *"The main concept of physics is a, | |
| s and the. thet to, theing.... the,a then,c,i to, thee in b. toed.,,e theyalp the in,er thees- s,el,,,, | |
| and, the of ine,,s the of cs of thesss the. f. to. thesining andor dar,,al the,. of p. | |
| the.s the.,,s. anded,e. of, ofed, l toinging and themsr the of of. to | |
| to thes thes aen,., ofes of a."* | |
| <br><br> | |
| **Prompt:** "Question: What is the capital of France?\nAnswer: "<br> | |
| **Output:**: *"Question: What is the capital of France? | |
| Answer:,. and to the. toc. ofs the m,a thee.. the, f ofling. as.,,y bt, the p | |
| , in, the,,ees toed ing to. | |
| o, | |
| thes. the..,s the.ed and andang,,ed the of,,ms. of, thei the, the,ey,,s l.ing toe the the,se the to, the, the,aror, the of-. in the. the. the,e the of ds to,ic the the aal at the.. | |
| ingssy s and and"* | |
| ## Usage 🚀 | |
| ```python3 | |
| print("[*] Loading libraries...") | |
| import torch | |
| from transformers import LlamaForCausalLM, PreTrainedTokenizerFast | |
| model_path = "SupraLabs/MicroSupra-1k" | |
| print("[*] Loading tokenizer...") | |
| tokenizer = PreTrainedTokenizerFast.from_pretrained(model_path) | |
| print("[*] Loading model...") | |
| model = LlamaForCausalLM.from_pretrained(model_path) | |
| model.eval() | |
| prompt = "Question: What is the capital of France?\nAnswer:" | |
| print(f"[*] Prompt: {prompt!r}") | |
| inputs = tokenizer(prompt, return_tensors="pt") | |
| with torch.no_grad(): | |
| outputs = model.generate( | |
| input_ids=inputs["input_ids"], | |
| attention_mask=inputs["attention_mask"], | |
| max_new_tokens=150, | |
| do_sample=True, | |
| temperature=0.35, | |
| top_p=0.85, | |
| repetition_penalty=1.2, | |
| pad_token_id=tokenizer.pad_token_id, | |
| eos_token_id=tokenizer.eos_token_id, | |
| ) | |
| print("[*] Output:", tokenizer.decode(outputs[0], skip_special_tokens=True)) | |
| ``` | |
| ## Why did SupraLabs create this??? | |
| Because we are experimenting sizes, experiments(like 1Bit quant, distillation(NEW THINGS ARE COMING WITH DISTILLATION! GET TUNED!), pruning) all to better your experience! We are working on big things! | |
| ## Training guide | |
| We trained MicroSupra on a GTX750 Ti 4GB in 1 Minute for 3 epochs.<br> | |
| The model was trained on the first 300 million tokens of Sample-10BT from Fineweb-Edu using streaming tokenization. | |
| ## Final thoughts | |
| Even without any intelligence, it shows that scaling laws are real. This ant model doesn't know how to talk, but we all know it emotions 🤖🫶 |