Spaces:
Running
Running
| <html lang="en"> | |
| <head> | |
| <meta charset="UTF-8"> | |
| <meta name="viewport" content="width=device-width, initial-scale=1.0"> | |
| <title>I Released TMLM-Haiku-1.3 And It Is Still Dumb | TinyMemoryLM</title> | |
| <link rel="stylesheet" href="bluesheet.css"> | |
| <link rel="preconnect" href="https://fonts.googleapis.com"> | |
| <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin> | |
| <link href="https://fonts.googleapis.com/css2?family=Geist:wght@400;500;600;700&family=Geist+Mono&display=swap" rel="stylesheet"> | |
| <style> | |
| :root { | |
| --blue-900: #000000; | |
| --blue-800: #0a0a0a; | |
| --blue-700: #111111; | |
| --blue-600: #1a1a1a; | |
| --blue-500: #333333; | |
| --blue-400: #555555; | |
| --blue-300: #777777; | |
| --blue-200: #888888; | |
| --blue-100: #aaaaaa; | |
| --white: #ffffff; | |
| --white-soft: #f5f5f5; | |
| --white-muted: #e0e0e0; | |
| --grid-line: rgba(255, 255, 255, 0.03); | |
| --grid-line-major: rgba(255, 255, 255, 0.06); | |
| --accent: #ededed; | |
| --accent-muted: #888888; | |
| --font-sans: 'Geist', -apple-system, BlinkMacSystemFont, sans-serif; | |
| --font-mono: 'Geist Mono', 'SF Mono', 'Fira Code', monospace; | |
| --container-max: 1100px; | |
| } | |
| * { box-sizing: border-box; margin: 0; padding: 0; } | |
| html { font-size: 16px; scroll-behavior: smooth; } | |
| body { font-family: var(--font-sans); background: var(--blue-900); color: var(--white-muted); line-height: 1.7; -webkit-font-smoothing: antialiased; } | |
| a { color: var(--white); text-decoration: none; transition: color 0.15s ease; } | |
| a:hover { color: var(--accent); } | |
| .container { max-width: var(--container-max); margin: 0 auto; padding: 0 24px; } | |
| nav { position: fixed; top: 0; left: 0; right: 0; z-index: 100; background: rgba(0, 0, 0, 0.85); backdrop-filter: blur(12px); border-bottom: 1px solid var(--blue-600); padding: 16px 0; } | |
| nav .container { display: flex; justify-content: space-between; align-items: center; } | |
| .nav-brand { font-size: 18px; font-weight: 600; color: var(--white); display: flex; align-items: center; gap: 8px; } | |
| .nav-brand span { color: var(--accent); } | |
| .nav-links { display: flex; gap: 32px; } | |
| .nav-links a { font-size: 14px; font-weight: 500; color: var(--blue-200); } | |
| .nav-links a:hover { color: var(--white); } | |
| .post { padding: 140px 0 80px; } | |
| .post-back { display: inline-block; color: var(--blue-200); font-size: 14px; margin-bottom: 32px; } | |
| .post-back:hover { color: var(--accent); } | |
| .post-back::before { content: '← '; } | |
| .post-meta { display: flex; gap: 12px; margin-bottom: 20px; } | |
| .post-date { font-size: 13px; color: var(--blue-200); font-family: var(--font-mono); } | |
| .post-tag { font-size: 11px; font-weight: 600; text-transform: uppercase; letter-spacing: 0.05em; color: var(--white); background: rgba(255, 255, 255, 0.08); padding: 4px 10px; border-radius: 4px; } | |
| .post h1 { font-size: 36px; font-weight: 700; color: var(--white); margin-bottom: 32px; line-height: 1.2; letter-spacing: -0.02em; } | |
| .post-body p { font-size: 17px; line-height: 1.8; margin-bottom: 24px; color: var(--blue-200); } | |
| .post-body p:first-of-type { font-size: 20px; color: var(--white-muted); } | |
| .post-body h2 { font-size: 24px; font-weight: 600; color: var(--white); margin: 48px 0 20px; } | |
| .post-body blockquote { border-left: 3px solid var(--accent); padding: 20px 24px; margin: 32px 0; background: var(--blue-800); border-radius: 0 8px 8px 0; } | |
| .post-body blockquote p { font-size: 16px; font-style: italic; color: var(--blue-200); margin: 0; } | |
| .post-body hr { border: none; height: 1px; background: var(--blue-600); margin: 48px 0; } | |
| .code-block { background: var(--blue-800); border: 1px solid var(--blue-600); border-radius: 8px; padding: 20px; margin: 24px 0; font-family: var(--font-mono); font-size: 13px; overflow-x: auto; } | |
| .code-block .comment { color: var(--blue-200); font-style: italic; display: block; margin-top: 4px; } | |
| .stats-grid { display: grid; grid-template-columns: 1fr 1fr; gap: 16px; margin: 24px 0; } | |
| .stat-card { background: var(--blue-800); border: 1px solid var(--blue-600); border-radius: 8px; padding: 20px; text-align: center; } | |
| .stat-card .number { font-size: 32px; font-weight: 700; color: var(--accent); font-family: var(--font-mono); } | |
| .stat-card .label { font-size: 13px; color: var(--blue-200); margin-top: 8px; } | |
| .cta-box { background: var(--blue-800); border: 2px solid var(--accent); border-radius: 12px; padding: 24px; margin: 32px 0; text-align: center; } | |
| .cta-box a { color: var(--accent); font-weight: 600; font-size: 18px; word-break: break-all; } | |
| .cta-box a:hover { color: var(--white); } | |
| .cta-box p { margin: 12px 0 0; color: var(--blue-200); font-size: 14px; } | |
| .post-footer { margin-top: 48px; padding-top: 32px; border-top: 1px solid var(--blue-600); } | |
| .post-footer p { font-size: 14px; color: var(--blue-200); font-style: italic; margin: 0; } | |
| footer { padding: 40px 0; background: var(--blue-800); border-top: 1px solid var(--blue-600); text-align: center; } | |
| footer p { color: var(--blue-200); font-size: 14px; margin-bottom: 8px; } | |
| footer a { color: var(--blue-200); } | |
| footer a:hover { color: var(--accent); } | |
| @media (max-width: 768px) { .post h1 { font-size: 28px; } .nav-links { display: none; } .stats-grid { grid-template-columns: 1fr; } } | |
| </style> | |
| </head> | |
| <body> | |
| <svg class="scribbles" viewBox="0 0 1440 900" preserveAspectRatio="xMidYMid slice"> | |
| <path d="M100,50 Q150,30 200,60 T300,40 T400,70" fill="none" stroke="white" stroke-width="1"/> | |
| <path d="M800,200 Q850,180 900,210 T1000,190 T1100,220" fill="none" stroke="white" stroke-width="0.8"/> | |
| <path d="M200,700 Q250,680 300,710 T400,690 T500,720" fill="none" stroke="white" stroke-width="0.6"/> | |
| <path d="M1200,400 Q1250,380 1300,410 T1400,390" fill="none" stroke="white" stroke-width="0.7"/> | |
| <path d="M50,400 Q100,380 150,420 T250,400" fill="none" stroke="white" stroke-width="0.5"/> | |
| <circle cx="350" cy="150" r="30" fill="none" stroke="white" stroke-width="0.6"/> | |
| <circle cx="1100" cy="600" r="25" fill="none" stroke="white" stroke-width="0.5"/> | |
| <path d="M600,100 L620,80 L640,100 L660,80" fill="none" stroke="white" stroke-width="0.7"/> | |
| <path d="M1300,750 Q1320,730 1340,760 T1380,740" fill="none" stroke="white" stroke-width="0.5"/> | |
| <path d="M100,800 Q120,780 140,810 T180,790 T220,820" fill="none" stroke="white" stroke-width="0.6"/> | |
| <path d="M700,500 Q720,480 740,510 T780,490 T820,520" fill="none" stroke="white" stroke-width="0.4"/> | |
| <path d="M400,300 C420,280 440,320 460,300 C480,280 500,320 520,300" fill="none" stroke="white" stroke-width="0.5"/> | |
| <path d="M900,700 C920,680 940,720 960,700 C980,680 1000,720 1020,700" fill="none" stroke="white" stroke-width="0.6"/> | |
| <path d="M150,250 Q170,230 190,260 Q210,240 230,270" fill="none" stroke="white" stroke-width="0.4"/> | |
| <path d="M1050,100 Q1070,80 1090,110 Q1110,90 1130,120" fill="none" stroke="white" stroke-width="0.5"/> | |
| <path d="M500,850 C520,830 540,860 560,840 C580,820 600,860 620,840" fill="none" stroke="white" stroke-width="0.4"/> | |
| <path d="M1350,50 Q1370,30 1390,60 T1430,40" fill="none" stroke="white" stroke-width="0.5"/> | |
| <path d="M30,600 Q50,580 70,610 T110,590" fill="none" stroke="white" stroke-width="0.4"/> | |
| </svg> | |
| <nav> | |
| <div class="container"> | |
| <a href="index.html" class="nav-brand"><span>/</span>TinyMemoryLM</a> | |
| <div class="nav-links"> | |
| <a href="index.html">Home</a> | |
| <a href="blog.html">Blog</a> | |
| <a href="status.html">Status</a> | |
| </div> | |
| </div> | |
| </nav> | |
| <main> | |
| <article class="post"> | |
| <div class="container"> | |
| <a href="blog.html" class="post-back">Back to Blog</a> | |
| <header> | |
| <div class="post-meta"> | |
| <span class="post-date">2026-03-23</span> | |
| <span class="post-tag">Model Releases</span> | |
| </div> | |
| <h1>I Released TMLM-Haiku-1.3 And It Is Still Dumb</h1> | |
| </header> | |
| <div class="post-body"> | |
| <p>I released TMLM-Haiku-1.3 today. It is on Hugging Face. It is open weights. It is still completely devoid of intelligence. I trained it with Muon. I spent electricity. I generated heat. The model still thinks Paris is a person.</p> | |
| <p>You might ask why I keep doing this. You might ask why I versioned it to 1.3 instead of 2.0. You might ask why I used Muon instead of AdamW. I do not have good answers. I have weights.</p> | |
| <blockquote> | |
| <p>Progress is not always vertical. Sometimes it is horizontal. Sometimes it is circular. Sometimes it is just releasing the same dumb model with a different optimizer.</p> | |
| </blockquote> | |
| <h2>The Muon Experiment</h2> | |
| <p>AdamW is standard. SGD is classic. Muon is new. It claims better convergence for transformers. It claims to handle large batch sizes better. It claims to be worth the hype. I wanted to test the claim.</p> | |
| <p>I switched the optimizer. I kept the data. I kept the architecture. I kept the low expectations. The training loss went down faster. The validation loss still plateaued. The model still outputs fish facts when asked for math.</p> | |
| <div class="code-block"> | |
| <span class="comment"># Training config comparison</span><br> | |
| Haiku-1.0: AdamW, 261 hours, 600W<br> | |
| Haiku-1.3: Muon, 198 hours, 800W<br> | |
| <span class="comment"># Faster training. More power. Same stupidity.</span> | |
| </div> | |
| <p>The training finished in 198 hours instead of 261. That is a twenty-four percent speedup. I attribute this to Muon. I also attribute it to the 800W overclocked VBIOS I flashed last week. The GPU was screaming. The loss was descending. The result is unchanged.</p> | |
| <h2>Intelligence Report</h2> | |
| <div class="stats-grid"> | |
| <div class="stat-card"> | |
| <div class="number">0%</div> | |
| <div class="label">Intelligence Gain</div> | |
| </div> | |
| <div class="stat-card"> | |
| <div class="number">24%</div> | |
| <div class="label">Training Speedup</div> | |
| </div> | |
| <div class="stat-card"> | |
| <div class="number">100%</div> | |
| <div class="label">Still Hallucinates</div> | |
| </div> | |
| <div class="stat-card"> | |
| <div class="number">1.3</div> | |
| <div class="label">Version Number</div> | |
| </div> | |
| </div> | |
| <p>I tested it. I asked simple questions. It gave complex wrong answers. It is confident. It is fluent. It is incorrect. This is the hallmark of a modern language model. I have successfully replicated industry standards in my bedroom.</p> | |
| <h2>Why Version 1.3</h2> | |
| <p>Version 2.0 implies improvement. Version 2.0 implies a new architecture. Version 2.0 implies I solved something. I did not solve anything. I changed the optimizer. I tweaked the learning rate schedule. I added more dropout.</p> | |
| <p>Version 1.3 is honest. It says this is a minor update. It says do not expect miracles. It says the fish facts are still included at no extra cost. I value honesty in versioning.</p> | |
| <h2>The Hardware Impact</h2> | |
| <p>This model was trained on the Astral ROG RTX 5090 OC LC. The one with the Matrix VBIOS. The one running at 800W. The one that heats my room like a furnace. The Muon optimizer allowed larger batch sizes. Larger batch sizes meant more VRAM usage. More VRAM usage meant the 800W power limit was fully utilized.</p> | |
| <p>My electricity bill hates me. My GPU loves me. The model does not care. It exists. It consumes tokens. It produces nonsense. It is alive in the way a spreadsheet is alive.</p> | |
| <blockquote> | |
| <p>I spent eight hundred watts to make a model that cannot count. This is art. This is science. This is a waste of money. All three can be true.</p> | |
| </blockquote> | |
| <h2>What Changed</h2> | |
| <p>Technically? The loss curve is smoother. The gradients are more stable. The training did not NaN this time. I consider this a major victory. After the NaN disaster of last week, a completed training run feels like a miracle.</p> | |
| <p>Functionally? Nothing. It still does not know the capital of France. It still thinks two plus two is a philosophical question. It still apologizes profusely when it is wrong. Then it gives another wrong answer.</p> | |
| <h2>Download It If You Want</h2> | |
| <div class="cta-box"> | |
| <a href="https://huggingface.co/CompactAI/TMLM-Haiku-1.3" target="_blank">https://huggingface.co/CompactAI/TMLM-Haiku-1.3</a> | |
| <p>Free. Open weights. Trained with Muon. Still dumb. Run it locally. Save the API costs. Get fish answers directly on your hardware.</p> | |
| </div> | |
| <h2>Future Plans</h2> | |
| <p>Sonnet is still training. It is at 12 percent now. The overclocked GPU is helping. The Muon optimizer is being tested on Sonnet too. If Haiku-1.3 is any indication, Sonnet will be faster to train and equally disappointing.</p> | |
| <p>Opus is still a dream. A 600M parameter dream. A dream that requires me to not burn my house down. I am working on it. Slowly. Painfully. With too much power.</p> | |
| <h2>Final Thoughts</h2> | |
| <p>I released a model. It is not smart. It is faster to train. It uses more electricity. I am proud of it. This is what hobbyists do. We build things. We release them. We accept their flaws. We love them anyway.</p> | |
| <p>If you download it, please be kind. It is trying its best. Its best is not good. But it is trying. Just like me.</p> | |
| <hr> | |
| </div> | |
| <footer class="post-footer"> | |
| <p>Current status: Haiku-1.3 released. Sonnet at 12%. GPU at 800W. Sanity at 40%. Will continue training until something works.</p> | |
| </footer> | |
| </div> | |
| </article> | |
| </main> | |
| <footer> | |
| <div class="container"> | |
| <p>Built with curiosity over compute</p> | |
| <p>TinyMemoryLM by AILAY | 2026</p> | |
| </div> | |
| </footer> | |
| </body> | |
| </html> |