SupraLabs
/

Supra-50M-Base

Text Generation

text-generation-inference

Model card Files Files and versions

LH-Tech-AI commited on 3 days ago

Commit

e3d50ed

·

verified ·

1 Parent(s): 03cfb64

Update README.md

Files changed (1) hide show

README.md +8 -8

README.md CHANGED Viewed

@@ -21,7 +21,7 @@ tags:
 # 🦅 Supra-50M
-**Supra-50M** is a compact 50M-parameter causal language model built by SupraLabs, trained from scratch using a Llama-style architecture on 20 billion tokens of high-quality educational web text. Despite being significantly smaller than comparable open models, it achieves competitive or superior results on several key benchmarks. It is our first Supra Scalling Up plan model.
 ---
@@ -34,13 +34,13 @@ Supra-50M outperforms much larger models — GPT-2 Small (124M), SmolLM-135M, an
 ### Benchmark Table
 | Benchmark | Supra-50M *(ours)* | GPT-2 (124M) | SmolLM-135M | OpenELM-270M |
-|---|---|---|---|---|
-| **Parameters** | **50M** | 124M *(2.5× larger)* | 135M *(2.7× larger)* | 270M *(5.4× larger)* |
-| BLiMP (linguistics) | **76.3%** | ~63.0% | ~75.2% | ~68.0% |
-| SciQ (science) | **77.2%** | ~52.0% | ~74.5% | ~61.0% |
-| ARC-Easy (knowledge) | 52.2% | ~42.0% | **~55.0%** | ~46.0% |
-| PIQA (logic) | 62.2% | ~61.0% | **~63.3%** | ~60.5% |
-| HellaSwag (context) | 31.8% | ~31.0% | **~34.0%** | ~28.0% |
 ---

 # 🦅 Supra-50M
+**Supra-50M** is a compact 50M-parameter causal language model built by SupraLabs, trained from scratch using a Llama-style architecture on 20 billion tokens of high-quality educational web text. Despite being significantly smaller than comparable open models, it achieves competitive or superior results on several key benchmarks.
 ---
 ### Benchmark Table
 | Benchmark | Supra-50M *(ours)* | GPT-2 (124M) | SmolLM-135M | OpenELM-270M |
+| :--- | :--- | :--- | :--- | :--- |
+| **Parameters** | **50M** | 124M *(2.5×)* | 135M *(2.7×)* | 270M *(5.4×)* |
+| **BLiMP** (linguistics) | **76.3%** | 63.0% | **69.8%** | *(k.A.)* |
+| **SciQ** (science) | 77.2% | 53.2% | 73.4% | **84.70%** |
+| **ARC-Easy** (knowledge) | 52.2% | 42.0% | 49.2% | **45.08%** |
+| **PIQA** (logic) | 62.2% | 63.0% | 67.3% | **69.75%** |
+| **HellaSwag** (context) | 31.8% | 29.5% | 42.0% | **46.71%** |
 ---