Spaces:
Running
Running
Create The Next Haiku: 1M Parameters, 10 Billion Tokens, Zero Guarantees.html
Browse files
The Next Haiku: 1M Parameters, 10 Billion Tokens, Zero Guarantees.html
ADDED
|
@@ -0,0 +1,170 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<!DOCTYPE html>
|
| 2 |
+
<html lang="en">
|
| 3 |
+
<head>
|
| 4 |
+
<meta charset="UTF-8">
|
| 5 |
+
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
| 6 |
+
<title>The Next Haiku: 1M Parameters, 10 Billion Tokens, Zero Guarantees | FMN-GPT - CompactAI</title>
|
| 7 |
+
<link rel="preconnect" href="https://fonts.googleapis.com">
|
| 8 |
+
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
|
| 9 |
+
<link href="https://fonts.googleapis.com/css2?family=Geist:wght@400;500:600:700&family=Geist+Mono&display=swap" rel="stylesheet">
|
| 10 |
+
<style>
|
| 11 |
+
:root {
|
| 12 |
+
--blue-900: #0a1628;
|
| 13 |
+
--blue-800: #0f2240;
|
| 14 |
+
--blue-700: #142d54;
|
| 15 |
+
--blue-600: #1a3a6b;
|
| 16 |
+
--blue-500: #2250a0;
|
| 17 |
+
--blue-400: #3a7bd5;
|
| 18 |
+
--blue-300: #6ba3f0;
|
| 19 |
+
--blue-200: #a8c8f5;
|
| 20 |
+
--blue-100: #d4e4fa;
|
| 21 |
+
--white: #ffffff;
|
| 22 |
+
--white-soft: #f0f4fa;
|
| 23 |
+
--white-muted: #c8d8ec;
|
| 24 |
+
--grid-line: rgba(255, 255, 255, 0.08);
|
| 25 |
+
--grid-line-major: rgba(255, 255, 255, 0.18);
|
| 26 |
+
--accent: #6ba3f0;
|
| 27 |
+
--accent-muted: #3a7bd5;
|
| 28 |
+
--font-sans: 'Geist', -apple-system, BlinkMacSystemFont, sans-serif;
|
| 29 |
+
--font-mono: 'Geist Mono', 'SF Mono', 'Fira Code', monospace;
|
| 30 |
+
--container-max: 1100px;
|
| 31 |
+
}
|
| 32 |
+
* { box-sizing: border-box; margin: 0; padding: 0; }
|
| 33 |
+
html { font-size: 16px; scroll-behavior: smooth; }
|
| 34 |
+
body { font-family: var(--font-sans); background: var(--blue-900); color: var(--white-muted); line-height: 1.7; -webkit-font-smoothing: antialiased; }
|
| 35 |
+
a { color: var(--white); text-decoration: none; transition: color 0.15s ease; }
|
| 36 |
+
a:hover { color: var(--accent); }
|
| 37 |
+
.container { max-width: var(--container-max); margin: 0 auto; padding: 0 24px; }
|
| 38 |
+
nav { position: fixed; top: 0; left: 0; right: 0; z-index: 100; background: rgba(10, 22, 40, 0.92); backdrop-filter: blur(12px); border-bottom: 1px solid var(--blue-600); padding: 16px 0; }
|
| 39 |
+
nav .container { display: flex; justify-content: space-between; align-items: center; }
|
| 40 |
+
.nav-brand { font-size: 18px; font-weight: 600; color: var(--white); display: flex; align-items: center; gap: 8px; }
|
| 41 |
+
.nav-brand span { color: var(--accent); }
|
| 42 |
+
.nav-links { display: flex; gap: 32px; }
|
| 43 |
+
.nav-links a { font-size: 14px; font-weight: 500; color: var(--blue-200); }
|
| 44 |
+
.nav-links a:hover { color: var(--white); }
|
| 45 |
+
.post { padding: 140px 0 80px; }
|
| 46 |
+
.post-back { display: inline-block; color: var(--blue-200); font-size: 14px; margin-bottom: 32px; }
|
| 47 |
+
.post-back:hover { color: var(--accent); }
|
| 48 |
+
.post-back::before { content: '← '; }
|
| 49 |
+
.post-meta { display: flex; gap: 12px; margin-bottom: 20px; }
|
| 50 |
+
.post-date { font-size: 13px; color: var(--blue-200); font-family: var(--font-mono); }
|
| 51 |
+
.post-tag { font-size: 11px; font-weight: 600; text-transform: uppercase; letter-spacing: 0.05em; color: var(--accent); background: rgba(107, 163, 240, 0.1); padding: 4px 10px; border-radius: 4px; }
|
| 52 |
+
.post h1 { font-size: 36px; font-weight: 700; color: var(--white); margin-bottom: 32px; line-height: 1.2; letter-spacing: -0.02em; }
|
| 53 |
+
.post-body p { font-size: 17px; line-height: 1.8; margin-bottom: 24px; color: var(--blue-200); }
|
| 54 |
+
.post-body p:first-of-type { font-size: 20px; color: var(--white-muted); }
|
| 55 |
+
.post-body h2 { font-size: 24px; font-weight: 600; color: var(--white); margin: 48px 0 20px; }
|
| 56 |
+
.post-body blockquote { border-left: 3px solid var(--accent); padding: 20px 24px; margin: 32px 0; background: var(--blue-800); border-radius: 0 8px 8px 0; }
|
| 57 |
+
.post-body blockquote p { font-size: 16px; font-style: italic; color: var(--blue-200); margin: 0; }
|
| 58 |
+
.post-body hr { border: none; height: 1px; background: var(--blue-600); margin: 48px 0; }
|
| 59 |
+
.code-block { background: var(--blue-800); border: 1px solid var(--blue-600); border-radius: 8px; padding: 20px; margin: 24px 0; font-family: var(--font-mono); font-size: 13px; overflow-x: auto; }
|
| 60 |
+
.code-block .comment { color: var(--blue-200); font-style: italic; display: block; margin-top: 4px; }
|
| 61 |
+
.stats-grid { display: grid; grid-template-columns: 1fr 1fr; gap: 16px; margin: 24px 0; }
|
| 62 |
+
.stat-card { background: var(--blue-800); border: 1px solid var(--blue-600); border-radius: 8px; padding: 20px; text-align: center; }
|
| 63 |
+
.stat-card .number { font-size: 32px; font-weight: 700; color: var(--accent); font-family: var(--font-mono); }
|
| 64 |
+
.stat-card .label { font-size: 13px; color: var(--blue-200); margin-top: 8px; }
|
| 65 |
+
.post-footer { margin-top: 48px; padding-top: 32px; border-top: 1px solid var(--blue-600); }
|
| 66 |
+
.post-footer p { font-size: 14px; color: var(--blue-200); font-style: italic; margin: 0; }
|
| 67 |
+
footer { padding: 40px 0; background: var(--blue-800); border-top: 1px solid var(--blue-600); text-align: center; }
|
| 68 |
+
footer p { color: var(--blue-200); font-size: 14px; margin-bottom: 8px; }
|
| 69 |
+
footer a { color: var(--blue-200); }
|
| 70 |
+
footer a:hover { color: var(--accent); }
|
| 71 |
+
@media (max-width: 768px) { .post h1 { font-size: 28px; } .nav-links { display: none; } .stats-grid { grid-template-columns: 1fr; } }
|
| 72 |
+
</style>
|
| 73 |
+
</head>
|
| 74 |
+
<body>
|
| 75 |
+
<nav>
|
| 76 |
+
<div class="container">
|
| 77 |
+
<a href="index.html" class="nav-brand"><span>/</span>FMN-GPT</a>
|
| 78 |
+
<div class="nav-links">
|
| 79 |
+
<a href="blog.html">Blog</a>
|
| 80 |
+
<a href="status.html">Model Status</a>
|
| 81 |
+
<a href="https://huggingface.co/CompactAI-O" target="_blank">HuggingFace Org</a>
|
| 82 |
+
</div>
|
| 83 |
+
</div>
|
| 84 |
+
</nav>
|
| 85 |
+
<main>
|
| 86 |
+
<article class="post">
|
| 87 |
+
<div class="container">
|
| 88 |
+
<a href="blog.html" class="post-back">Back to Blog</a>
|
| 89 |
+
<header>
|
| 90 |
+
<div class="post-meta">
|
| 91 |
+
<span class="post-date">2026-04-08</span>
|
| 92 |
+
<span class="post-tag">Model Previews</span>
|
| 93 |
+
</div>
|
| 94 |
+
<h1>The Next Haiku: 1M Parameters, 10 Billion Tokens, Zero Guarantees</h1>
|
| 95 |
+
</header>
|
| 96 |
+
<div class="post-body">
|
| 97 |
+
<p>I am working on the next Haiku model. It is not released yet. It might never be released. But I am working on it. That counts for something. Probably.</p>
|
| 98 |
+
<p>The specs are simple. One million parameters. Ten billion training tokens. That is about ten thousand tokens per parameter. I think. My math might be wrong. My math is often wrong. But the ratio feels ambitious. Or reckless. Both can be true.</p>
|
| 99 |
+
<blockquote>
|
| 100 |
+
<p>Ten thousand tokens per parameter is either the secret to tiny model success or a great way to waste electricity. I am prepared for either outcome.</p>
|
| 101 |
+
</blockquote>
|
| 102 |
+
<h2>The Numbers</h2>
|
| 103 |
+
<div class="stats-grid">
|
| 104 |
+
<div class="stat-card">
|
| 105 |
+
<div class="number">1M</div>
|
| 106 |
+
<div class="label">Parameters</div>
|
| 107 |
+
</div>
|
| 108 |
+
<div class="stat-card">
|
| 109 |
+
<div class="number">10B</div>
|
| 110 |
+
<div class="label">Training Tokens</div>
|
| 111 |
+
</div>
|
| 112 |
+
<div class="stat-card">
|
| 113 |
+
<div class="number">~10K</div>
|
| 114 |
+
<div class="label">Tokens Per Parameter</div>
|
| 115 |
+
</div>
|
| 116 |
+
<div class="stat-card">
|
| 117 |
+
<div class="number">??</div>
|
| 118 |
+
<div class="label">Will It Speak</div>
|
| 119 |
+
</div>
|
| 120 |
+
</div>
|
| 121 |
+
<p>Ten billion tokens is a lot of tokens. It is also not that many tokens if you have ever trained a frontier model. But for a one million parameter model, it feels excessive. Like bringing a fire hose to water a houseplant. The houseplant might drown. Or it might finally grow.</p>
|
| 122 |
+
<h2>Why So Many Tokens</h2>
|
| 123 |
+
<p>Haiku-2 learned to speak sometimes. It also learned to output chuamliamce. The ratio of speech to chuamliamce was not ideal. I want to improve that ratio.</p>
|
| 124 |
+
<p>More tokens means more exposure to language patterns. More exposure means better internal representations. Better representations mean fewer pipe characters. This is the theory. The practice might involve more NaN losses. But I am optimistic. Or stubborn. Both can be true.</p>
|
| 125 |
+
<div class="code-block">
|
| 126 |
+
<span class="comment"># My training plan in pseudocode</span><br>
|
| 127 |
+
tokens = load_10_billion_tokens()<br>
|
| 128 |
+
model = TinyModel(1_000_000_params)<br>
|
| 129 |
+
for token in tokens:<br>
|
| 130 |
+
loss = model.forward(token)<br>
|
| 131 |
+
model.backward(loss)<br>
|
| 132 |
+
if loss == nan:<br>
|
| 133 |
+
cry()<br>
|
| 134 |
+
retry()<br>
|
| 135 |
+
<span class="comment"># Simple. Elegant. Probably naive.</span>
|
| 136 |
+
</div>
|
| 137 |
+
<h2>Where Sonnet And Opus Fit In</h2>
|
| 138 |
+
<p>Sonnet is not training right now. Opus is not training either. They are both waiting. Patiently. Or impatiently. Hard to tell with models that do not have feelings.</p>
|
| 139 |
+
<p>When I say I am giving my computer away for a week, I mean I am giving up my computer for a week. Or weeks. I will not use it at all. No browsing. No blogging. No debugging NaN losses at three AM. Just Sonnet and Opus training, uninterrupted, until they finish or crash or both.</p>
|
| 140 |
+
<p>The new script version includes bug fixes. Optimizations. SPIN integration. Things I learned from Claude Code. Things I learned from AxionLab-Co. Things I learned from crying at loss curves. All of it gets tested on the next Haiku first. Then, if nothing explodes, Sonnet and Opus get the upgraded treatment.</p>
|
| 141 |
+
<h2>What To Expect</h2>
|
| 142 |
+
<p>I do not know. That is the honest answer. The model might speak fluently. It might still output chuamliamce. It might do both at the same time. Progress is weird.</p>
|
| 143 |
+
<p>What I hope for: fewer special characters. More complete sentences. Occasional correct answers to simple questions. The ability to mention Paris without calling it a person named Pierre.</p>
|
| 144 |
+
<p>What I expect: a model that is slightly less confused than Haiku-2. Slightly more coherent. Slightly more useful. Slightly is the keyword. I am not expecting miracles. I am expecting incremental improvement. That feels achievable.</p>
|
| 145 |
+
<blockquote>
|
| 146 |
+
<p>Training tiny models is just hoping that enough data will eventually teach a million parameters to form a thought. Sometimes it works. Sometimes you get chuamliamce. Both outcomes are educational.</p>
|
| 147 |
+
</blockquote>
|
| 148 |
+
<h2>The Timeline</h2>
|
| 149 |
+
<p>Ten billion tokens takes time. My GPU is fast. It is also one GPU. The training will run while I am not using my computer. I will monitor it from afar. I will handle any NaN emergencies remotely. This is trust. This is also anxiety.</p>
|
| 150 |
+
<p>If everything goes well, the next Haiku might be ready in a few weeks. If things go poorly, it might never be ready. If things go weird, it might be ready but output only chuamliamce in perfect grammatical sentences. All outcomes are possible.</p>
|
| 151 |
+
<h2>Final Thoughts</h2>
|
| 152 |
+
<p>The next Haiku is coming. One million parameters. Ten billion tokens. Ten thousand tokens per parameter. Zero guarantees about coherence.</p>
|
| 153 |
+
<p>Sonnet is waiting. Opus is waiting. My computer is about to be dedicated to training them exclusively. I am staying here. Writing blogs. Hoping the models learn to speak.</p>
|
| 154 |
+
<p>If you want tiny models that might eventually make sense, stick around. If you want guarantees, maybe look elsewhere. I do not do guarantees. I do hope. I do effort. I do chuamliamce with confidence.</p>
|
| 155 |
+
<hr>
|
| 156 |
+
</div>
|
| 157 |
+
<footer class="post-footer">
|
| 158 |
+
<p>Current status: Next Haiku training. Sonnet waiting. Opus waiting. Computer preparing for dedicated training marathon. Me preparing for anxiety. Progress is weird. Chuamliamce remains a mystery.</p>
|
| 159 |
+
</footer>
|
| 160 |
+
</div>
|
| 161 |
+
</article>
|
| 162 |
+
</main>
|
| 163 |
+
<footer>
|
| 164 |
+
<div class="container">
|
| 165 |
+
<p>Built with curiosity over compute</p>
|
| 166 |
+
<p>FMN-GPT by <a href="https://huggingface.co/CompactAI-O" target="_blank">CompactAI-O</a> | 2026</p>
|
| 167 |
+
</div>
|
| 168 |
+
</footer>
|
| 169 |
+
</body>
|
| 170 |
+
</html>
|