AxionLab-official commited on
Commit
af833ed
Β·
verified Β·
1 Parent(s): a8476f9

Create 1bit-quantization.html

Browse files
Files changed (1) hide show
  1. 1bit-quantization.html +298 -0
1bit-quantization.html ADDED
@@ -0,0 +1,298 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="UTF-8">
5
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
6
+ <title>1Bit Quantization: a 50/50 chance for small models | SupraLabs Blog</title>
7
+ <style>
8
+ :root {
9
+ --bg: #0f0f0f;
10
+ --surface: #1a1a1a;
11
+ --border: #333;
12
+ --text: #e0e0e0;
13
+ --accent: #536bfe;
14
+ --muted: #888;
15
+ --font-mono: 'JetBrains Mono', 'Fira Code', monospace;
16
+ }
17
+ * { margin: 0; padding: 0; box-sizing: border-box; }
18
+ body {
19
+ background-color: var(--bg);
20
+ color: var(--text);
21
+ font-family: 'Inter', -apple-system, sans-serif;
22
+ line-height: 1.6;
23
+ padding: 2rem;
24
+ }
25
+ code, pre, .mono { font-family: var(--font-mono); }
26
+ .container { max-width: 900px; margin: 0 auto; }
27
+
28
+ /* --- Header --- */
29
+ header {
30
+ border-bottom: 2px solid var(--border);
31
+ padding-bottom: 2rem;
32
+ margin-bottom: 3rem;
33
+ display: flex;
34
+ justify-content: space-between;
35
+ align-items: flex-end;
36
+ }
37
+ .logo-area h1 {
38
+ font-size: 1.2rem;
39
+ text-transform: uppercase;
40
+ letter-spacing: 2px;
41
+ color: var(--accent);
42
+ line-height: 1;
43
+ display: flex;
44
+ align-items: center;
45
+ gap: 10px;
46
+ }
47
+ .logo-area a { text-decoration: none; color: inherit; }
48
+ nav a {
49
+ color: var(--text);
50
+ text-decoration: none;
51
+ margin-left: 1.5rem;
52
+ font-size: 0.9rem;
53
+ border-bottom: 1px solid transparent;
54
+ }
55
+ nav a:hover { border-bottom: 1px solid var(--accent); }
56
+
57
+ /* --- Blog Post Layout --- */
58
+ .post-header { margin-bottom: 3rem; }
59
+ .post-header h2 {
60
+ font-size: 3rem;
61
+ line-height: 1.1;
62
+ margin-bottom: 1rem;
63
+ font-weight: 800;
64
+ }
65
+ .post-meta {
66
+ font-family: var(--font-mono);
67
+ color: var(--accent);
68
+ font-size: 0.9rem;
69
+ margin-bottom: 2rem;
70
+ }
71
+ .post-content {
72
+ background: var(--surface);
73
+ border: 1px solid var(--border);
74
+ padding: 3rem;
75
+ margin-bottom: 4rem;
76
+ }
77
+ .post-content h2 {
78
+ font-size: 1.8rem;
79
+ margin: 2.5rem 0 1rem 0;
80
+ color: var(--accent);
81
+ }
82
+ .post-content h2:first-child { margin-top: 0; }
83
+ .post-content p {
84
+ margin-bottom: 1.5rem;
85
+ font-size: 1.1rem;
86
+ color: var(--text);
87
+ }
88
+ .post-content ul {
89
+ margin-bottom: 1.5rem;
90
+ padding-left: 1.5rem;
91
+ }
92
+ .post-content li { margin-bottom: 0.5rem; font-size: 1.1rem; }
93
+ .post-content strong { color: #fff; }
94
+
95
+ /* --- Inline code --- */
96
+ .post-content code {
97
+ background: #111;
98
+ border: 1px solid var(--border);
99
+ padding: 2px 6px;
100
+ border-radius: 3px;
101
+ font-size: 0.95em;
102
+ color: var(--accent);
103
+ }
104
+
105
+ /* --- Math-style callout box --- */
106
+ .callout {
107
+ border-left: 3px solid var(--accent);
108
+ background: #111;
109
+ padding: 1rem 1.5rem;
110
+ margin: 2rem 0;
111
+ font-family: var(--font-mono);
112
+ font-size: 0.95rem;
113
+ color: #ccc;
114
+ }
115
+ .callout span {
116
+ display: block;
117
+ color: var(--muted);
118
+ font-size: 0.8rem;
119
+ margin-bottom: 0.4rem;
120
+ }
121
+
122
+ /* --- Comparison table --- */
123
+ .table-wrap { overflow-x: auto; margin: 2rem 0; }
124
+ table {
125
+ width: 100%;
126
+ border-collapse: collapse;
127
+ font-family: var(--font-mono);
128
+ font-size: 0.9rem;
129
+ }
130
+ th {
131
+ background: #111;
132
+ color: var(--accent);
133
+ padding: 0.75rem 1rem;
134
+ text-align: left;
135
+ border: 1px solid var(--border);
136
+ }
137
+ td {
138
+ padding: 0.7rem 1rem;
139
+ border: 1px solid var(--border);
140
+ color: var(--text);
141
+ }
142
+ tr:nth-child(even) td { background: #111; }
143
+ .highlight td { color: #fff; font-weight: 600; }
144
+
145
+ /* --- Tags --- */
146
+ .tags { display: flex; gap: 0.5rem; margin-top: 2rem; flex-wrap: wrap; }
147
+ .tag {
148
+ font-family: var(--font-mono);
149
+ font-size: 0.7rem;
150
+ padding: 2px 8px;
151
+ border: 1px solid var(--border);
152
+ border-radius: 4px;
153
+ color: var(--muted);
154
+ }
155
+
156
+ footer {
157
+ margin-top: 6rem;
158
+ padding-bottom: 2rem;
159
+ font-size: 0.8rem;
160
+ color: var(--muted);
161
+ text-align: center;
162
+ }
163
+
164
+ .logo-area {
165
+ display: flex;
166
+ align-items: center;
167
+ gap: 10px;
168
+ font-weight: bold;
169
+ font-size: 1.2rem;
170
+ }
171
+
172
+ @media (max-width: 600px) {
173
+ .post-header h2 { font-size: 2rem; }
174
+ .post-content { padding: 1.5rem; }
175
+ header { flex-direction: column; align-items: flex-start; gap: 1rem; }
176
+ nav a { margin-left: 0; margin-right: 1rem; }
177
+ }
178
+ </style>
179
+ </head>
180
+ <body>
181
+
182
+ <div class="container">
183
+ <header>
184
+ <div class="logo-area" style="font-size: 1.5em;">
185
+ <a href="./index.html"><h1><img src="./image.png" style="height: 2em"> SupraLabs_</h1></a>
186
+ </div>
187
+ <nav>
188
+ <a href="./index.html#news">News</a>
189
+ <a href="https://huggingface.co/SupraLabs" target="blank">HuggingFace</a>
190
+ <a href="./index.html#hardware">Hardware</a>
191
+ </nav>
192
+ </header>
193
+
194
+ <article>
195
+ <div class="post-header">
196
+ <div class="post-meta">// 2026-05-13 | Research</div>
197
+ <h2>1-Bit Quantization:<br>Shrinking Models to the Bone</h2>
198
+ </div>
199
+
200
+ <div class="post-content">
201
+
202
+ <p>What if each weight in a neural network could only be <strong>βˆ’1, 0, or +1</strong>? That is the premise of 1-bit quantization, and it is more powerful than it sounds. This post breaks down how it works, why it matters, and where it falls short.</p>
203
+
204
+ <h2>What is Quantization?</h2>
205
+ <p>A standard neural network stores weights as 32-bit or 16-bit floating point numbers. Those floats carry a lot of information, but also a lot of memory cost. <strong>Quantization</strong> is the process of reducing the precision of those numbers to save space and speed up computation.</p>
206
+ <p>Most production models today use <strong>8-bit (INT8)</strong> or <strong>4-bit (INT4)</strong> quantization. These methods compress weights into integers while still preserving enough numeric range to keep quality high. 1-bit takes this to the extreme: <strong>every single weight is represented by just one bit.</strong></p>
207
+
208
+ <div class="callout">
209
+ <span>// memory comparison for a 7B model</span>
210
+ FP16 &nbsp;&nbsp;&nbsp;β†’ ~14 GB<br>
211
+ INT8 &nbsp;&nbsp;&nbsp;β†’ ~7 GB<br>
212
+ INT4 &nbsp;&nbsp;&nbsp;β†’ ~3.5 GB<br>
213
+ 1-bit &nbsp;&nbsp;β†’ ~0.9 GB
214
+ </div>
215
+
216
+ <h2>How Does 1-Bit Actually Work?</h2>
217
+ <p>Pure binary quantization maps every weight to either <code>+1</code> or <code>βˆ’1</code>. The model learns <em>which sign</em> each weight should carry, not its magnitude. During inference, all multiplications become cheap additions and subtractions, no floating point needed.</p>
218
+ <p>The most important recent work in this space is <strong>BitNet</strong> (Microsoft Research, 2023) and its successor <strong>BitNet b1.58</strong> (2024). BitNet b1.58 uses a ternary scheme: weights are constrained to <code>{βˆ’1, 0, +1}</code>. The extra zero value turns many operations into a complete no-op, making inference even faster.</p>
219
+
220
+
221
+ <div class="callout">
222
+ <span>// bitnet b1.58 weight constraint</span>
223
+ W ∈ {βˆ’1, 0, +1} &nbsp;β€” ternary, not strictly binary<br>
224
+ activations are still quantized to INT8
225
+ </div>
226
+
227
+ <p>It's like a dream for weak hardware users.</p>
228
+
229
+ <h2>Training vs Post-Training Quantization</h2>
230
+ <p>There are two fundamentally different approaches here, and the distinction matters a lot.</p>
231
+ <ul>
232
+ <li><strong>Post-Training Quantization (PTQ)</strong>: take a pre-trained FP16 model and quantize it after the fact. Fast and convenient, but quality degrades β€” especially below 4 bits.</li>
233
+ <li><strong>Quantization-Aware Training (QAT)</strong>: train the model from scratch with quantized weights. The model adapts to its constraints during training. This is how BitNet works β€” and it is what makes 1-bit viable at all.</li>
234
+ </ul>
235
+ <p>Trying to PTQ a standard model down to 1-bit produces catastrophic quality loss. <strong>1-bit only works if the model is trained to be 1-bit from day one.</strong></p>
236
+
237
+ <h2>The Numbers: How Much Do You Lose?</h2>
238
+ <p>The honest answer: <strong>it depends heavily on model size.</strong> Small models suffer more than large ones. A 125M parameter BitNet model loses noticeably more quality than a 7B BitNet model when compared to their FP16 equivalents.</p>
239
+
240
+ <div class="table-wrap">
241
+ <table>
242
+ <thead>
243
+ <tr>
244
+ <th>Format</th>
245
+ <th>Bits/weight</th>
246
+ <th>Memory (7B)</th>
247
+ <th>Speed</th>
248
+ <th>Quality loss</th>
249
+ </tr>
250
+ </thead>
251
+ <tbody>
252
+ <tr><td>FP16</td><td>16</td><td>~14 GB</td><td>baseline</td><td>none</td></tr>
253
+ <tr><td>INT8</td><td>8</td><td>~7 GB</td><td>1.5–2Γ—</td><td>minimal</td></tr>
254
+ <tr><td>INT4</td><td>4</td><td>~3.5 GB</td><td>2–4Γ—</td><td>low</td></tr>
255
+ <tr class="highlight"><td>1.58-bit</td><td>~1.58</td><td>~0.9 GB</td><td>up to 8Γ—</td><td>moderate*</td></tr>
256
+ </tbody>
257
+ </table>
258
+ </div>
259
+ <p style="font-size:0.85rem; color: var(--muted); margin-top: -1rem;">* at large scale (7B+), quality loss becomes very competitive with INT4.</p>
260
+
261
+ <h2>Why This Matters for Edge and Tiny Models</h2>
262
+ <p>For us at SupraLabs, 1-bit quantization is an interesting reference point. At sub-1M parameters, the scale of Supra Mini, the quality penalty of 1-bit QAT is severe. The model simply does not have enough capacity to absorb the constraint. <strong>At our scale, every bit of precision counts.</strong></p>
263
+ <p>Where 1-bit shines is on large models deployed at the edge: think 7B+ models running on phones, embedded devices, or microcontrollers without a GPU. The memory savings are dramatic and the inference speedup from replacing multiplications with additions is real and measurable.</p>
264
+
265
+ <h2>The Catch</h2>
266
+ <p>1-bit is not a free lunch. The main trade-offs are:</p>
267
+ <ul>
268
+ <li><strong>Requires purpose-built training</strong> = no PTQ shortcut.</li>
269
+ <li><strong>It's a 50/50 chance for small models</strong> = it can help our model or kill it lol</li>
270
+ <li><strong>Small models suffer</strong> = below ~1B parameters, the quality loss is hard to justify.</li>
271
+ <li><strong>Activations still need INT8</strong> = it's not fully binary end-to-end yet.</li>
272
+ </ul>
273
+
274
+ <h2>How this helps us?(and YOU!)</h2>
275
+ <p>We, SupraLabs, are going to try every type of experiment, quantization, pruning, distillation, all to create the best models for you!</p>
276
+
277
+ <h2>Final Thought</h2>
278
+ <p>1Bit quantization is a little bit sensitive area for small models, but we are going to try everything to do it works!</p>
279
+
280
+ <div class="tags">
281
+ <span class="tag">#quantization</span>
282
+ <span class="tag">#1bit</span>
283
+ <span class="tag">#bitnet</span>
284
+ <span class="tag">#tinyml</span>
285
+ <span class="tag">#research</span>
286
+ <span class="tag">#edge-ai</span>
287
+ <span class="tag">#open-source</span>
288
+ </div>
289
+ </div>
290
+ </article>
291
+
292
+ <footer>
293
+ <p class="mono">&copy; 2026 SupraLabs // Built for the community.</p>
294
+ </footer>
295
+ </div>
296
+
297
+ </body>
298
+ </html>