File size: 14,750 Bytes
41a567b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Supra Mini v4 2M | SupraLabs Blog</title>
    <style>
        :root {
            --bg: #0f0f0f;
            --surface: #1a1a1a;
            --border: #333;
            --text: #e0e0e0;
            --accent: #536bfe;
            --muted: #888;
            --font-mono: 'JetBrains Mono', 'Fira Code', monospace;
        }
        * { margin: 0; padding: 0; box-sizing: border-box; }
        body {
            background-color: var(--bg);
            color: var(--text);
            font-family: 'Inter', -apple-system, sans-serif;
            line-height: 1.6;
            padding: 2rem;
        }
        code, pre, .mono { font-family: var(--font-mono); }
        .container { max-width: 900px; margin: 0 auto; }

        header {
            border-bottom: 2px solid var(--border);
            padding-bottom: 2rem;
            margin-bottom: 3rem;
            display: flex;
            justify-content: space-between;
            align-items: flex-end;
        }
        .logo-area h1 {
            font-size: 1.2rem;
            text-transform: uppercase;
            letter-spacing: 2px;
            color: var(--accent);
            line-height: 1;
            display: flex;
            align-items: center;
            gap: 10px;
        }
        .logo-area a { text-decoration: none; color: inherit; }
        .logo-area {
            display: flex;
            align-items: center;
            gap: 10px;
            font-weight: bold;
            font-size: 1.2rem;
        }
        nav a {
            color: var(--text);
            text-decoration: none;
            margin-left: 1.5rem;
            font-size: 0.9rem;
            border-bottom: 1px solid transparent;
        }
        nav a:hover { border-bottom: 1px solid var(--accent); }

        .post-header { margin-bottom: 3rem; }
        .post-header h2 {
            font-size: 3rem;
            line-height: 1.1;
            margin-bottom: 1rem;
            font-weight: 800;
        }
        .post-meta {
            font-family: var(--font-mono);
            color: var(--accent);
            font-size: 0.9rem;
            margin-bottom: 2rem;
        }
        .post-content {
            background: var(--surface);
            border: 1px solid var(--border);
            padding: 3rem;
            margin-bottom: 4rem;
        }
        .post-content h2 {
            font-size: 1.8rem;
            margin: 2.5rem 0 1rem 0;
            color: var(--accent);
        }
        .post-content h2:first-child { margin-top: 0; }
        .post-content p {
            margin-bottom: 1.5rem;
            font-size: 1.1rem;
            color: var(--text);
        }
        .post-content ul {
            margin-bottom: 1.5rem;
            padding-left: 1.5rem;
        }
        .post-content li { margin-bottom: 0.5rem; font-size: 1.1rem; }
        .post-content strong { color: #fff; }

        .post-content code {
            background: #111;
            border: 1px solid var(--border);
            padding: 2px 6px;
            border-radius: 3px;
            font-size: 0.95em;
            color: var(--accent);
        }

        /* Code block */
        .code-block {
            background: #111;
            border: 1px solid var(--border);
            padding: 1.5rem;
            margin: 2rem 0;
            overflow-x: auto;
            font-family: var(--font-mono);
            font-size: 0.88rem;
            line-height: 1.7;
            color: #ccc;
        }
        .code-block .comment { color: var(--muted); }
        .code-block .keyword { color: var(--accent); }

        .callout {
            border-left: 3px solid var(--accent);
            background: #111;
            padding: 1rem 1.5rem;
            margin: 2rem 0;
            font-family: var(--font-mono);
            font-size: 0.95rem;
            color: #ccc;
        }
        .callout span {
            display: block;
            color: var(--muted);
            font-size: 0.8rem;
            margin-bottom: 0.4rem;
        }

        /* Output example box */
        .output-example {
            border: 1px solid var(--border);
            background: #111;
            padding: 1.5rem;
            margin: 1.5rem 0;
        }
        .output-example .prompt-label {
            font-family: var(--font-mono);
            font-size: 0.75rem;
            color: var(--accent);
            margin-bottom: 0.4rem;
        }
        .output-example .prompt-text {
            font-weight: 700;
            color: #fff;
            margin-bottom: 1rem;
            font-size: 1rem;
        }
        .output-example .output-label {
            font-family: var(--font-mono);
            font-size: 0.75rem;
            color: var(--muted);
            margin-bottom: 0.4rem;
        }
        .output-example .output-text {
            color: var(--text);
            font-size: 0.95rem;
            font-style: italic;
            line-height: 1.7;
        }

        .table-wrap { overflow-x: auto; margin: 2rem 0; }
        table {
            width: 100%;
            border-collapse: collapse;
            font-family: var(--font-mono);
            font-size: 0.9rem;
        }
        th {
            background: #111;
            color: var(--accent);
            padding: 0.75rem 1rem;
            text-align: left;
            border: 1px solid var(--border);
        }
        td {
            padding: 0.7rem 1rem;
            border: 1px solid var(--border);
            color: var(--text);
        }
        tr:nth-child(even) td { background: #111; }

        .tags { display: flex; gap: 0.5rem; margin-top: 2rem; flex-wrap: wrap; }
        .tag {
            font-family: var(--font-mono);
            font-size: 0.7rem;
            padding: 2px 8px;
            border: 1px solid var(--border);
            border-radius: 4px;
            color: var(--muted);
        }

        footer {
            margin-top: 6rem;
            padding-bottom: 2rem;
            font-size: 0.8rem;
            color: var(--muted);
            text-align: center;
        }

        @media (max-width: 600px) {
            .post-header h2 { font-size: 2rem; }
            .post-content { padding: 1.5rem; }
            header { flex-direction: column; align-items: flex-start; gap: 1rem; }
            nav a { margin-left: 0; margin-right: 1rem; }
        }
    </style>
</head>
<body>

    <div class="container">
        <header>
            <div class="logo-area" style="font-size: 1.5em;">
                <a href="./index.html"><h1><img src="./image.png" style="height: 2em"> SupraLabs_</h1></a>
            </div>
            <nav>
                <a href="./index.html#news">News</a>
                <a href="https://huggingface.co/SupraLabs" target="blank">HuggingFace</a>
                <a href="./index.html#hardware">Hardware</a>
            </nav>
        </header>

        <article>
            <div class="post-header">
                <div class="post-meta">// 2026-05-15 | Release</div>
                <h2>πŸ¦… Supra Mini v4 2M<br>is here.</h2>
            </div>

            <div class="post-content">

                <p>Today we are releasing <strong>Supra Mini v4 2M:</strong> the fourth version of our Supra Mini series and our biggest leap yet. Trained on <strong>3 billion tokens</strong> of Fineweb-Edu for 2 epochs, v4 pushes our parameter count to 2.6M while keeping the model light enough to run on any CPU.</p>

                <h2>What changed from v3?</h2>
                <p>Look at the numbers: <strong>v4 has ~5Γ— more parameters than v3</strong>. We went from 467k to 2.6M parameters. This is not just a bigger model, the entire config was rethought to fit more capacity while keeping the architecture clean and the training fast.</p>

                <div class="callout">
                    <span>// supra mini v4 2m β€” model config</span>
                    Parameters &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;β†’ 2,623,104 (~2.6M)<br>
                    Architecture &nbsp;&nbsp;&nbsp;β†’ Llama<br>
                    Vocab size &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;β†’ 8,192 (custom BPE)<br>
                    Hidden size &nbsp;&nbsp;&nbsp;&nbsp;β†’ 128<br>
                    Intermediate &nbsp;&nbsp;β†’ 512<br>
                    Layers &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;β†’ 6<br>
                    Attention heads &nbsp;β†’ 4<br>
                    Context length &nbsp;&nbsp;β†’ 1,024 tokens<br>
                    Trained in &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;β†’ bfloat16
                </div>

                <h2>Training setup</h2>
                <p>We trained v4 on a <strong>single NVIDIA RTX 5060 Ti 16GB</strong> in approximately 3 hours for 2 epochs. The dataset is the first 3 billion tokens of Sample-10BT from <strong>Fineweb-Edu</strong>, streamed and tokenized on the fly with our custom BPE tokenizer.</p>
                <p>The final training loss after 2 epochs came in at <strong>4.618</strong>. The full training code, tokenizer, training loop, and inference script, is available directly in the model repo.</p>

                <h2>Benchmarks</h2>
                <p>We evaluated v4 using <code>lm-eval</code> on three tasks. The random baselines are included so you can judge fairly.</p>

                <div class="table-wrap">
                    <table>
                        <thead>
                            <tr>
                                <th>Task</th>
                                <th>Score</th>
                                <th>Random baseline</th>
                                <th>Delta</th>
                            </tr>
                        </thead>
                        <tbody>
                            <tr><td>ARC_Easy</td><td>0.3152</td><td>0.25 (25%)</td><td>+6.5pp above random</td></tr>
                            <tr><td>Wikitext (PPL)</td><td>3.1652</td><td>β€”</td><td>lower is better</td></tr>
                            <tr><td>BLiMP</td><td>0.607</td><td>0.50 (50%)</td><td>+10.7pp above random</td></tr>
                        </tbody>
                    </table>
                </div>

                <p>A 2.6M parameter model beating random by over 10 points on BLiMP, a test of grammatical knowledge, is a solid result at this scale. Not GPT-4, obviously, but that is never the point with Supra Mini.</p>

                <h2>Example outputs</h2>
                <p>Here is what v4 generates at <code>temperature=0.5</code>, <code>top_k=25</code>, <code>top_p=0.9</code>:</p>

                <div class="output-example">
                    <div class="prompt-label">// prompt</div>
                    <div class="prompt-text">"Artificial intelligence is "</div>
                    <div class="output-label">// output</div>
                    <div class="output-text">"Artificial intelligence is the idea of the theory that the world has a very high-performance technology, which is also more important to society's lives than people who are being able to find their own knowledge and understanding how it can be used for future generations..."</div>
                </div>

                <div class="output-example">
                    <div class="prompt-label">// prompt</div>
                    <div class="prompt-text">"The main concept of physics is "</div>
                    <div class="output-label">// output</div>
                    <div class="output-text">"The main concept of physics is 'animi-hisi', and therefore the universe's own light. In this case, a theory that is not only used to explain what it can be called 'the universe' or 'two planets, which are exactly about the earth's gravitational energy, but also in reality, we know how much things do..."</div>
                </div>

                <p>The model clearly has a coherent sense of topic, it stays on subject and builds sentences. It hallucinates and drifts (as all base models at this scale do), but the fluency is real.</p>

                <h2>How to run it</h2>
                <p>Drop this into any Python environment with Transformers installed:</p>

                <div class="code-block">
<span class="keyword">from</span> transformers <span class="keyword">import</span> pipeline<br>
<span class="keyword">import</span> torch<br>
<br>
pipe = pipeline(<br>
&nbsp;&nbsp;&nbsp;&nbsp;<span class="string">"text-generation"</span>,<br>
&nbsp;&nbsp;&nbsp;&nbsp;model=<span class="string">"SupraLabs/Supra-Mini-v4-2M"</span>,<br>
&nbsp;&nbsp;&nbsp;&nbsp;device_map=<span class="string">"auto"</span>,<br>
&nbsp;&nbsp;&nbsp;&nbsp;torch_dtype=torch.float16 <span class="keyword">if</span> torch.cuda.is_available() <span class="keyword">else</span> torch.float32<br>
)<br>
<br>
result = pipe(<br>
&nbsp;&nbsp;&nbsp;&nbsp;<span class="string">"The importance of education is"</span>,<br>
&nbsp;&nbsp;&nbsp;&nbsp;max_new_tokens=150,<br>
&nbsp;&nbsp;&nbsp;&nbsp;do_sample=True,<br>
&nbsp;&nbsp;&nbsp;&nbsp;temperature=0.5,<br>
&nbsp;&nbsp;&nbsp;&nbsp;top_k=25,<br>
&nbsp;&nbsp;&nbsp;&nbsp;top_p=0.9,<br>
&nbsp;&nbsp;&nbsp;&nbsp;repetition_penalty=1.2<br>
)<br>
<span class="keyword">print</span>(result[0][<span class="string">'generated_text'</span>])
                </div>

                <h2>What's next?</h2>
                <p>v4 is a base model, it is not fine-tuned for instruction following or chat. The next experiments on our roadmap include fine-tuning on instruction datasets, exploring quantization at this new scale, and continuing to push the parameter count while keeping training accessible to everyone with a consumer GPU.</p>
                <p><strong>The model is live on HuggingFace. Go try it.</strong></p>

                <div class="callout">
                    <span>// links</span>
                    Model &nbsp;&nbsp;β†’ huggingface.co/SupraLabs/Supra-Mini-v4-2M<br>
                    License β†’  Apache 2.0<br>
                    Series &nbsp;β†’ Supra Mini collection on HuggingFace
                </div>

                <div class="tags">
                    <span class="tag">#release</span>
                    <span class="tag">#supra-mini-v4</span>
                    <span class="tag">#tinyml</span>
                    <span class="tag">#llama</span>
                    <span class="tag">#open-source</span>
                    <span class="tag">#fineweb-edu</span>
                    <span class="tag">#edge-ai</span>
                </div>
            </div>
        </article>

        <footer>
            <p class="mono">&copy; 2026 SupraLabs // Built for the community.</p>
        </footer>
    </div>

</body>
</html>