Instructions to use SupraLabs/Supra-Mini-v4-2M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use SupraLabs/Supra-Mini-v4-2M with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="SupraLabs/Supra-Mini-v4-2M")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("SupraLabs/Supra-Mini-v4-2M")
model = AutoModelForCausalLM.from_pretrained("SupraLabs/Supra-Mini-v4-2M")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use SupraLabs/Supra-Mini-v4-2M with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "SupraLabs/Supra-Mini-v4-2M"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SupraLabs/Supra-Mini-v4-2M",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/SupraLabs/Supra-Mini-v4-2M

SGLang

How to use SupraLabs/Supra-Mini-v4-2M with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "SupraLabs/Supra-Mini-v4-2M" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SupraLabs/Supra-Mini-v4-2M",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "SupraLabs/Supra-Mini-v4-2M" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SupraLabs/Supra-Mini-v4-2M",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use SupraLabs/Supra-Mini-v4-2M with Docker Model Runner:
```
docker model run hf.co/SupraLabs/Supra-Mini-v4-2M
```

LH-Tech-AI commited on 9 days ago

Commit

208972d

verified ·

1 Parent(s): d999f97

Upload 9 files

Browse files

Files changed (9) hide show

benchmarks.md +79 -0
config.json +32 -0
generation_config.json +10 -0
inference.py +33 -0
model.safetensors +3 -0
tokenizer.json +0 -0
tokenizer_config.json +9 -0
train.py +185 -0
training_args.bin +3 -0

benchmarks.md ADDED Viewed

	@@ -0,0 +1,79 @@

+|                           Tasks                            |Version|Filter|n-shot|    Metric     |   |  Value  |   |Stderr|
+|------------------------------------------------------------|------:|------|-----:|---------------|---|--------:|---|------|
+|arc_easy                                                    |      1|none  |     0|acc            |↑  |   0.2727|±  |0.0091|
+|                                                            |       |none  |     0|acc_norm       |↑  |   0.2816|±  |0.0092|
+|blimp                                                       |      2|none  |      |acc            |↑  |   0.5526|±  |0.0017|
+| - blimp_adjunct_island                                     |      1|none  |     0|acc            |↑  |   0.7330|±  |0.0140|
+| - blimp_anaphor_gender_agreement                           |      1|none  |     0|acc            |↑  |   0.3820|±  |0.0154|
+| - blimp_anaphor_number_agreement                           |      1|none  |     0|acc            |↑  |   0.5030|±  |0.0158|
+| - blimp_animate_subject_passive                            |      1|none  |     0|acc            |↑  |   0.5520|±  |0.0157|
+| - blimp_animate_subject_trans                              |      1|none  |     0|acc            |↑  |   0.7250|±  |0.0141|
+| - blimp_causative                                          |      1|none  |     0|acc            |↑  |   0.5010|±  |0.0158|
+| - blimp_complex_NP_island                                  |      1|none  |     0|acc            |↑  |   0.5640|±  |0.0157|
+| - blimp_coordinate_structure_constraint_complex_left_branch|      1|none  |     0|acc            |↑  |   0.0840|±  |0.0088|
+| - blimp_coordinate_structure_constraint_object_extraction  |      1|none  |     0|acc            |↑  |   0.4930|±  |0.0158|
+| - blimp_determiner_noun_agreement_1                        |      1|none  |     0|acc            |↑  |   0.7000|±  |0.0145|
+| - blimp_determiner_noun_agreement_2                        |      1|none  |     0|acc            |↑  |   0.7070|±  |0.0144|
+| - blimp_determiner_noun_agreement_irregular_1              |      1|none  |     0|acc            |↑  |   0.5500|±  |0.0157|
+| - blimp_determiner_noun_agreement_irregular_2              |      1|none  |     0|acc            |↑  |   0.7110|±  |0.0143|
+| - blimp_determiner_noun_agreement_with_adj_2               |      1|none  |     0|acc            |↑  |   0.6170|±  |0.0154|
+| - blimp_determiner_noun_agreement_with_adj_irregular_1     |      1|none  |     0|acc            |↑  |   0.5010|±  |0.0158|
+| - blimp_determiner_noun_agreement_with_adj_irregular_2     |      1|none  |     0|acc            |↑  |   0.6180|±  |0.0154|
+| - blimp_determiner_noun_agreement_with_adjective_1         |      1|none  |     0|acc            |↑  |   0.6380|±  |0.0152|
+| - blimp_distractor_agreement_relational_noun               |      1|none  |     0|acc            |↑  |   0.3050|±  |0.0146|
+| - blimp_distractor_agreement_relative_clause               |      1|none  |     0|acc            |↑  |   0.2710|±  |0.0141|
+| - blimp_drop_argument                                      |      1|none  |     0|acc            |↑  |   0.6970|±  |0.0145|
+| - blimp_ellipsis_n_bar_1                                   |      1|none  |     0|acc            |↑  |   0.2640|±  |0.0139|
+| - blimp_ellipsis_n_bar_2                                   |      1|none  |     0|acc            |↑  |   0.4140|±  |0.0156|
+| - blimp_existential_there_object_raising                   |      1|none  |     0|acc            |↑  |   0.7440|±  |0.0138|
+| - blimp_existential_there_quantifiers_1                    |      1|none  |     0|acc            |↑  |   0.9030|±  |0.0094|
+| - blimp_existential_there_quantifiers_2                    |      1|none  |     0|acc            |↑  |   0.1200|±  |0.0103|
+| - blimp_existential_there_subject_raising                  |      1|none  |     0|acc            |↑  |   0.6530|±  |0.0151|
+| - blimp_expletive_it_object_raising                        |      1|none  |     0|acc            |↑  |   0.6850|±  |0.0147|
+| - blimp_inchoative                                         |      1|none  |     0|acc            |↑  |   0.4090|±  |0.0156|
+| - blimp_intransitive                                       |      1|none  |     0|acc            |↑  |   0.5600|±  |0.0157|
+| - blimp_irregular_past_participle_adjectives               |      1|none  |     0|acc            |↑  |   0.7220|±  |0.0142|
+| - blimp_irregular_past_participle_verbs                    |      1|none  |     0|acc            |↑  |   0.6330|±  |0.0152|
+| - blimp_irregular_plural_subject_verb_agreement_1          |      1|none  |     0|acc            |↑  |   0.6140|±  |0.0154|
+| - blimp_irregular_plural_subject_verb_agreement_2          |      1|none  |     0|acc            |↑  |   0.7250|±  |0.0141|
+| - blimp_left_branch_island_echo_question                   |      1|none  |     0|acc            |↑  |   0.6450|±  |0.0151|
+| - blimp_left_branch_island_simple_question                 |      1|none  |     0|acc            |↑  |   0.1690|±  |0.0119|
+| - blimp_matrix_question_npi_licensor_present               |      1|none  |     0|acc            |↑  |   0.0020|±  |0.0014|
+| - blimp_npi_present_1                                      |      1|none  |     0|acc            |↑  |   0.3860|±  |0.0154|
+| - blimp_npi_present_2                                      |      1|none  |     0|acc            |↑  |   0.3810|±  |0.0154|
+| - blimp_only_npi_licensor_present                          |      1|none  |     0|acc            |↑  |   0.6120|±  |0.0154|
+| - blimp_only_npi_scope                                     |      1|none  |     0|acc            |↑  |   0.4280|±  |0.0157|
+| - blimp_passive_1                                          |      1|none  |     0|acc            |↑  |   0.6450|±  |0.0151|
+| - blimp_passive_2                                          |      1|none  |     0|acc            |↑  |   0.6410|±  |0.0152|
+| - blimp_principle_A_c_command                              |      1|none  |     0|acc            |↑  |   0.6910|±  |0.0146|
+| - blimp_principle_A_case_1                                 |      1|none  |     0|acc            |↑  |   1.0000|±  |     0|
+| - blimp_principle_A_case_2                                 |      1|none  |     0|acc            |↑  |   0.5190|±  |0.0158|
+| - blimp_principle_A_domain_1                               |      1|none  |     0|acc            |↑  |   0.9810|±  |0.0043|
+| - blimp_principle_A_domain_2                               |      1|none  |     0|acc            |↑  |   0.5570|±  |0.0157|
+| - blimp_principle_A_domain_3                               |      1|none  |     0|acc            |↑  |   0.4680|±  |0.0158|
+| - blimp_principle_A_reconstruction                         |      1|none  |     0|acc            |↑  |   0.2410|±  |0.0135|
+| - blimp_regular_plural_subject_verb_agreement_1            |      1|none  |     0|acc            |↑  |   0.7200|±  |0.0142|
+| - blimp_regular_plural_subject_verb_agreement_2            |      1|none  |     0|acc            |↑  |   0.6030|±  |0.0155|
+| - blimp_sentential_negation_npi_licensor_present           |      1|none  |     0|acc            |↑  |   1.0000|±  |     0|
+| - blimp_sentential_negation_npi_scope                      |      1|none  |     0|acc            |↑  |   0.4990|±  |0.0158|
+| - blimp_sentential_subject_island                          |      1|none  |     0|acc            |↑  |   0.3440|±  |0.0150|
+| - blimp_superlative_quantifiers_1                          |      1|none  |     0|acc            |↑  |   0.5400|±  |0.0158|
+| - blimp_superlative_quantifiers_2                          |      1|none  |     0|acc            |↑  |   0.1780|±  |0.0121|
+| - blimp_tough_vs_raising_1                                 |      1|none  |     0|acc            |↑  |   0.4330|±  |0.0157|
+| - blimp_tough_vs_raising_2                                 |      1|none  |     0|acc            |↑  |   0.5950|±  |0.0155|
+| - blimp_transitive                                         |      1|none  |     0|acc            |↑  |   0.6260|±  |0.0153|
+| - blimp_wh_island                                          |      1|none  |     0|acc            |↑  |   0.4180|±  |0.0156|
+| - blimp_wh_questions_object_gap                            |      1|none  |     0|acc            |↑  |   0.5430|±  |0.0158|
+| - blimp_wh_questions_subject_gap                           |      1|none  |     0|acc            |↑  |   0.9160|±  |0.0088|
+| - blimp_wh_questions_subject_gap_long_distance             |      1|none  |     0|acc            |↑  |   0.9410|±  |0.0075|
+| - blimp_wh_vs_that_no_gap                                  |      1|none  |     0|acc            |↑  |   0.9800|±  |0.0044|
+| - blimp_wh_vs_that_no_gap_long_distance                    |      1|none  |     0|acc            |↑  |   0.9820|±  |0.0042|
+| - blimp_wh_vs_that_with_gap                                |      1|none  |     0|acc            |↑  |   0.0280|±  |0.0052|
+| - blimp_wh_vs_that_with_gap_long_distance                  |      1|none  |     0|acc            |↑  |   0.0150|±  |0.0038|
+|wikitext                                                    |      2|none  |     0|bits_per_byte  |↓  |   2.1661|±  |   N/A|
+|                                                            |       |none  |     0|byte_perplexity|↓  |   4.4881|±  |   N/A|
+|                                                            |       |none  |     0|word_perplexity|↓  |3068.2023|±  |   N/A|
+|Groups|Version|Filter|n-shot|Metric|   |Value |   |Stderr|
+|------|------:|------|------|------|---|-----:|---|-----:|
+|blimp |      2|none  |      |acc   |↑  |0.5526|±  |0.0017|

config.json ADDED Viewed

	@@ -0,0 +1,32 @@

+{
+  "architectures": [
+    "LlamaForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": 0,
+  "dtype": "float32",
+  "eos_token_id": 2,
+  "head_dim": 8,
+  "hidden_act": "silu",
+  "hidden_size": 64,
+  "initializer_range": 0.02,
+  "intermediate_size": 128,
+  "max_position_embeddings": 512,
+  "mlp_bias": false,
+  "model_type": "llama",
+  "num_attention_heads": 8,
+  "num_hidden_layers": 5,
+  "num_key_value_heads": 8,
+  "pad_token_id": 1,
+  "pretraining_tp": 1,
+  "rms_norm_eps": 1e-06,
+  "rope_parameters": {
+    "rope_theta": 10000.0,
+    "rope_type": "default"
+  },
+  "tie_word_embeddings": true,
+  "transformers_version": "5.8.1",
+  "use_cache": false,
+  "vocab_size": 4096
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 0,
+  "eos_token_id": 2,
+  "output_attentions": false,
+  "output_hidden_states": false,
+  "pad_token_id": 1,
+  "transformers_version": "5.8.1",
+  "use_cache": true
+}

inference.py ADDED Viewed

	@@ -0,0 +1,33 @@

+print("[*] Loading libraries...")
+import torch
+from transformers import LlamaForCausalLM, PreTrainedTokenizerFast
+model_path = "./Supra-Mini-v3-0.5M-FINAL"
+print("[*] Loading tokenizer...")
+tokenizer = PreTrainedTokenizerFast.from_pretrained(model_path)
+print("[*] Loading model...")
+model = LlamaForCausalLM.from_pretrained(model_path)
+model.eval()
+prompt = "The main concept of physics is "
+print(f"[*] Prompt: {prompt!r}")
+inputs = tokenizer(prompt, return_tensors="pt")
+with torch.no_grad():
+    outputs = model.generate(
+        input_ids=inputs["input_ids"],
+        attention_mask=inputs["attention_mask"],
+        max_new_tokens=150,
+        do_sample=True,
+        temperature=0.5,
+        top_p=0.9,
+        top_k=25,
+        repetition_penalty=1.3,
+        pad_token_id=tokenizer.pad_token_id,
+        eos_token_id=tokenizer.eos_token_id,
+    )
+print("[*] Output:", tokenizer.decode(outputs[0], skip_special_tokens=True))

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c9e652fea30d654bd4e0e523b1056857c43745adb3288af1f35986282768cfc0
+size 1875544

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+  "backend": "tokenizers",
+  "bos_token": "<s>",
+  "eos_token": "</s>",
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": "<pad>",
+  "tokenizer_class": "TokenizersBackend",
+  "unk_token": "<unk>"
+}

train.py ADDED Viewed

	@@ -0,0 +1,185 @@

+"""
+© SupraLabs 2026 - Official pretraining code for Supra Mini v3 0.5M
+"""
+import os
+os.environ["PYTORCH_ALLOC_CONF"] = "expandable_segments:True"
+os.environ["CUDA_VISIBLE_DEVICES"] = "0"
+print("[*] Loading libraries...")
+import torch
+import math
+import numpy as np
+from datasets import load_dataset
+from tokenizers import ByteLevelBPETokenizer
+from transformers import (
+    LlamaConfig,
+    LlamaForCausalLM,
+    PreTrainedTokenizerFast,
+    Trainer,
+    TrainingArguments,
+)
+from torch.utils.data import Dataset
+from tqdm import tqdm
+print("[*] Loading tokenizer...")
+fast_tokenizer = ByteLevelBPETokenizer(
+    "./custom_llama_tokenizer-vocab.json",
+    "./custom_llama_tokenizer-merges.txt"
+)
+tokenizer = PreTrainedTokenizerFast(
+    tokenizer_object=fast_tokenizer,
+    bos_token="<s>",
+    eos_token="</s>",
+    unk_token="<unk>",
+    pad_token="<pad>",
+)
+TOKEN_BIN     = "./tokens.bin"
+TARGET_TOKENS = 1_000_000_000
+SEQ_LEN       = 512
+BATCH_TEXTS   = 1000
+FLUSH_EVERY   = 1_000_000
+def build_token_bin(fast_tokenizer, path=TOKEN_BIN, target_tokens=TARGET_TOKENS):
+    if os.path.exists(path) and os.path.getsize(path) >= target_tokens * 2:
+        print(f"[=] Reusing existing token file: {path}")
+        return
+    print(f"[*] Streaming + tokenizing {target_tokens:,} tokens → {path}")
+    mm = np.memmap(path, dtype=np.uint16, mode="w+", shape=(target_tokens,))
+    dataset = load_dataset(
+        "HuggingFaceFW/fineweb-edu", "sample-10BT",
+        split="train", streaming=True
+    )
+    written = 0
+    buf = []
+    texts = []
+    pbar = tqdm(total=target_tokens, desc="[*] Gathering tokens", unit="tok")
+    def flush_buf():
+        nonlocal written, buf
+        if not buf:
+            return False
+        n = min(len(buf), target_tokens - written)
+        mm[written:written + n] = np.asarray(buf[:n], dtype=np.uint16)
+        written += n
+        pbar.update(n)
+        del buf[:n]
+        return written >= target_tokens
+    for example in dataset:
+        texts.append(example["text"])
+        if len(texts) >= BATCH_TEXTS:
+            encs = fast_tokenizer.encode_batch(texts)
+            texts.clear()
+            for e in encs:
+                buf.extend(e.ids)
+            if len(buf) >= FLUSH_EVERY:
+                if flush_buf():
+                    break
+    if written < target_tokens and texts:
+        encs = fast_tokenizer.encode_batch(texts)
+        for e in encs:
+            buf.extend(e.ids)
+    if written < target_tokens:
+        flush_buf()
+    pbar.close()
+    mm.flush()
+    del mm
+    print(f"[+] Wrote {written:,} tokens to {path} "
+          f"({os.path.getsize(path)/1e6:.1f} MB)")
+class MemmapDataset(Dataset):
+    def __init__(self, path, total_tokens, seq_len=SEQ_LEN):
+        self.path     = path
+        self.seq_len  = seq_len
+        self.n_chunks = total_tokens // seq_len
+        self._data    = None  # lazy open (Multiprocessing-safe)
+    @property
+    def data(self):
+        if self._data is None:
+            self._data = np.memmap(
+                self.path, dtype=np.uint16, mode="r",
+                shape=(self.n_chunks * self.seq_len,)
+            )
+        return self._data
+    def __len__(self):
+        return self.n_chunks
+    def __getitem__(self, idx):
+        s   = idx * self.seq_len
+        arr = np.asarray(self.data[s:s + self.seq_len], dtype=np.int64)
+        ids = torch.from_numpy(arr)
+        return {"input_ids": ids, "labels": ids.clone()}
+def collate_fn(batch):
+    input_ids = torch.stack([b["input_ids"] for b in batch])
+    labels    = torch.stack([b["labels"]    for b in batch])
+    return {"input_ids": input_ids, "labels": labels}
+print(f"[*] Preparing {TARGET_TOKENS:,} tokens (streaming, memmap-backed)...")
+build_token_bin(fast_tokenizer, TOKEN_BIN, TARGET_TOKENS)
+dataset = MemmapDataset(TOKEN_BIN, TARGET_TOKENS, seq_len=SEQ_LEN)
+print(f"[+] Dataset ready: {len(dataset):,} chunks of {SEQ_LEN} tokens")
+print("[*] Setting up model...")
+config = LlamaConfig(
+    vocab_size=len(tokenizer.get_vocab()),
+    hidden_size=64,
+    intermediate_size=128,
+    num_hidden_layers=5,
+    num_attention_heads=8,
+    num_key_value_heads=8,
+    max_position_embeddings=512,
+    tie_word_embeddings=True,
+    pad_token_id=tokenizer.pad_token_id,
+    bos_token_id=tokenizer.bos_token_id,
+    eos_token_id=tokenizer.eos_token_id,
+)
+model = LlamaForCausalLM(config)
+print(f"[*] Model parameters: {model.num_parameters():,}")
+print("[*] Defining training arguments...")
+training_args = TrainingArguments(
+    output_dir="./Supra-Mini-v3-0.5M",
+    num_train_epochs=2,
+    per_device_train_batch_size=256,
+    gradient_accumulation_steps=4,
+    save_steps=500,
+    save_total_limit=2,
+    logging_steps=100,
+    weight_decay=0.01,
+    fp16=False,
+    bf16=True,
+    push_to_hub=False,
+    report_to="none",
+    dataloader_num_workers=os.cpu_count() // 2,
+    dataloader_pin_memory=True,
+    learning_rate=5e-4,
+    lr_scheduler_type="cosine",
+    warmup_ratio=0.02,
+)
+trainer = Trainer(
+    model=model,
+    args=training_args,
+    train_dataset=dataset,
+    data_collator=collate_fn,
+)
+print("[*] Starting training...")
+trainer.train()
+trainer.save_model("./Supra-Mini-v3-0.5M-FINAL")
+tokenizer.save_pretrained("./Supra-Mini-v3-0.5M-FINAL")
+print("[*] Training finished.")

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6af52f41b76534782ca382fbeefc7f4a63a5a810768f8d1e40a0c6e2bd67d6dd
+size 5265