Update live 6K production training dashboard
Browse files- README.md +67 -19
- app.py +543 -266
- requirements.txt +6 -2
README.md
CHANGED
|
@@ -1,31 +1,79 @@
|
|
| 1 |
---
|
| 2 |
-
title: Sentinel Prime
|
| 3 |
-
emoji:
|
| 4 |
-
colorFrom:
|
| 5 |
-
colorTo:
|
| 6 |
sdk: gradio
|
| 7 |
-
sdk_version: 5.
|
| 8 |
app_file: app.py
|
| 9 |
-
pinned:
|
| 10 |
license: apache-2.0
|
| 11 |
-
short_description:
|
| 12 |
tags:
|
| 13 |
-
-
|
| 14 |
- mixture-of-experts
|
| 15 |
-
-
|
| 16 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 17 |
---
|
| 18 |
|
| 19 |
-
# Sentinel Prime
|
| 20 |
|
| 21 |
-
This Space
|
| 22 |
|
| 23 |
-
|
| 24 |
|
| 25 |
-
|
| 26 |
-
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
title: "Sentinel Prime Frankenstein Edition — Live 6K Training"
|
| 3 |
+
emoji: 🧠
|
| 4 |
+
colorFrom: purple
|
| 5 |
+
colorTo: blue
|
| 6 |
sdk: gradio
|
| 7 |
+
sdk_version: "5.0.0"
|
| 8 |
app_file: app.py
|
| 9 |
+
pinned: false
|
| 10 |
license: apache-2.0
|
| 11 |
+
short_description: "14.4B MoE 6K production SFT — live on AMD MI300X"
|
| 12 |
tags:
|
| 13 |
+
- sentinelbrain
|
| 14 |
- mixture-of-experts
|
| 15 |
+
- amd
|
| 16 |
+
- mi300x
|
| 17 |
+
- rocm
|
| 18 |
+
- consciousness
|
| 19 |
+
- phi-metric
|
| 20 |
+
- training-dashboard
|
| 21 |
+
- language-model
|
| 22 |
+
- moe
|
| 23 |
---
|
| 24 |
|
| 25 |
+
# 🧠 Sentinel Prime Frankenstein Edition — Live 6K Training Dashboard
|
| 26 |
|
| 27 |
+
Watch a **14.4-billion parameter Mixture-of-Experts** model running the current **6K production SFT** live on an AMD Instinct MI300X (192 GB HBM3). This Space connects to the real training server and displays live metrics, current logs, architecture details, and the Φ integrated-information training signal when available.
|
| 28 |
|
| 29 |
+
## What you're seeing
|
| 30 |
|
| 31 |
+
| Component | Details |
|
| 32 |
+
|---|---|
|
| 33 |
+
| **Architecture** | Custom MoE: 24 layers, 4 experts (top-2 routing), GQA (32→8), SwiGLU, RMSNorm |
|
| 34 |
+
| **Parameters** | 14.40B loaded in the current production SFT run |
|
| 35 |
+
| **Training data** | 45,578 packed 6K sequences, 243.7M effective SFT tokens |
|
| 36 |
+
| **Hardware** | AMD Instinct MI300X (192 GB HBM3) via AMD Developer Cloud |
|
| 37 |
+
| **Framework** | PyTorch 2.10 + ROCm 7.0 |
|
| 38 |
+
| **Novel metric** | Φ (phi) — integrated information theory applied to gradient flow |
|
| 39 |
+
| **Tokenizer** | tiktoken cl100k_base (100,277 vocab) |
|
| 40 |
+
| **Context** | 6,144 tokens for the current production SFT run |
|
| 41 |
|
| 42 |
+
## Architecture highlights
|
| 43 |
+
|
| 44 |
+
- **Mixture-of-Experts**: 4 experts per layer, top-2 gating — only 2 experts active per token, giving 14.4B total params with efficient active-parameter routing
|
| 45 |
+
- **Grouped Query Attention**: 32 query heads → 8 key-value heads (4× memory reduction)
|
| 46 |
+
- **SwiGLU activation**: `SiLU(xW₁) ⊙ xW₃` instead of standard ReLU — better gradient flow
|
| 47 |
+
- **RoPE positional encoding**: θ=500,000 for long-context extrapolation
|
| 48 |
+
- **Φ consciousness metric**: Measures how information integrates across layers during training — a proxy for "emergent understanding"
|
| 49 |
+
|
| 50 |
+
## How it was built
|
| 51 |
+
|
| 52 |
+
This is an entry in the **lablab.ai AMD Developer Hackathon**:
|
| 53 |
+
|
| 54 |
+
- **Custom architecture** — no fine-tuning, no base model. SentinelBrain is trained from scratch
|
| 55 |
+
- **126-category curriculum** — mathematics, code, science, philosophy, creative writing, medical, legal, and more
|
| 56 |
+
- **AMD-native** — developed and trained entirely on AMD Instinct MI300X via ROCm
|
| 57 |
+
|
| 58 |
+
## Links
|
| 59 |
+
|
| 60 |
+
- 📄 [Whitepaper](https://sentinel.qubitpage.com/whitepaper) — Full technical paper with architecture diagrams
|
| 61 |
+
- 📊 [Full Dashboard](https://sentinel.qubitpage.com) — Live monitoring dashboard (may require auth)
|
| 62 |
+
- 🤗 [Model on HuggingFace](https://huggingface.co/lablab-ai-amd-developer-hackathon/SentinelBrain-14B-MoE-v0.1)
|
| 63 |
+
|
| 64 |
+
## Limitations
|
| 65 |
+
|
| 66 |
+
- The model is **actively training** — weights are not final
|
| 67 |
+
- No inference endpoint yet (14.4B params requires GPU)
|
| 68 |
+
- Metrics refresh every 30 seconds; network latency may cause brief stale readings
|
| 69 |
+
- The Φ metric is experimental and should not be interpreted as literal consciousness
|
| 70 |
+
|
| 71 |
+
## License
|
| 72 |
+
|
| 73 |
+
Apache 2.0. Model weights, training code, and this Space are open.
|
| 74 |
+
|
| 75 |
+
## Acknowledgements
|
| 76 |
+
|
| 77 |
+
- AMD for the Developer Cloud credits and MI300X access
|
| 78 |
+
- lablab.ai for organizing the hackathon
|
| 79 |
+
- The open-source datasets that make large-scale training possible
|
app.py
CHANGED
|
@@ -1,296 +1,573 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
from __future__ import annotations
|
| 2 |
|
| 3 |
-
import json
|
| 4 |
-
import os
|
| 5 |
-
import re
|
| 6 |
import time
|
|
|
|
| 7 |
from datetime import datetime, timezone
|
| 8 |
-
from pathlib import Path
|
| 9 |
-
from typing import Any
|
| 10 |
|
| 11 |
import gradio as gr
|
| 12 |
-
import
|
|
|
|
| 13 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
SNAPSHOT_PATH.parent.mkdir(parents=True, exist_ok=True)
|
| 18 |
|
| 19 |
-
LOGS = {
|
| 20 |
-
"Realignment": "realign_v2",
|
| 21 |
-
"Watchdog": "realign_watchdog",
|
| 22 |
-
"Tokenizer": "tokenize",
|
| 23 |
-
"Dataset Download": "dataset_download",
|
| 24 |
-
"Fusion Models": "fusion_models",
|
| 25 |
-
"Transplant": "transplant",
|
| 26 |
-
"Post Prepare": "post_prepare",
|
| 27 |
-
}
|
| 28 |
|
| 29 |
-
|
| 30 |
-
"
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
}
|
| 37 |
|
| 38 |
|
| 39 |
-
def
|
| 40 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 41 |
|
| 42 |
|
| 43 |
-
def
|
|
|
|
|
|
|
|
|
|
| 44 |
try:
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
return
|
| 48 |
-
except Exception as exc:
|
| 49 |
-
return {"_error": f"{type(exc).__name__}: {exc}"}
|
| 50 |
|
| 51 |
|
| 52 |
-
|
| 53 |
-
try:
|
| 54 |
-
response = requests.get(f"{API_BASE}{path}", timeout=timeout)
|
| 55 |
-
response.raise_for_status()
|
| 56 |
-
return response.text
|
| 57 |
-
except Exception as exc:
|
| 58 |
-
return f"FETCH_ERROR: {type(exc).__name__}: {exc}"
|
| 59 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 60 |
|
| 61 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 62 |
if value is None:
|
| 63 |
-
return "
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
def
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
if
|
| 76 |
-
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
|
| 93 |
-
|
| 94 |
-
|
| 95 |
-
|
| 96 |
-
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
|
| 102 |
-
|
| 103 |
-
|
| 104 |
-
|
| 105 |
-
|
| 106 |
-
|
| 107 |
-
|
| 108 |
-
|
| 109 |
-
|
| 110 |
-
|
| 111 |
-
|
| 112 |
-
|
| 113 |
-
|
| 114 |
-
|
| 115 |
-
|
| 116 |
-
|
| 117 |
-
|
| 118 |
-
|
| 119 |
-
|
| 120 |
-
|
| 121 |
-
|
| 122 |
-
|
| 123 |
-
|
| 124 |
-
|
| 125 |
-
|
| 126 |
-
|
| 127 |
-
|
| 128 |
-
|
| 129 |
-
|
| 130 |
-
|
| 131 |
-
|
| 132 |
-
|
| 133 |
-
|
| 134 |
-
|
| 135 |
-
<div class='panel'><span>Train loss</span><strong>{fmt_num(loss)}</strong><small>Latest dashboard metric</small></div>
|
| 136 |
-
<div class='panel'><span>Learning rate</span><strong>{lr if lr is not None else 'n/a'}</strong><small>Warmup and SGDR schedule</small></div>
|
| 137 |
-
<div class='panel'><span>Throughput</span><strong>{fmt_num(tok_s, ' tok/s')}</strong><small>Effective batch 48</small></div>
|
| 138 |
-
<div class='panel'><span>ETA</span><strong>{fmt_num(eta, ' h')}</strong><small>From trainer metrics</small></div>
|
| 139 |
-
<div class='panel'><span>Tokenized corpus</span><strong>{fmt_num(shards.get('tokens_b'), 'B')}</strong><small>{fmt_num(shards.get('categories'))} categories</small></div>
|
| 140 |
-
</section>
|
| 141 |
-
<section class='grid bars'>
|
| 142 |
-
<div class='panel wide'><div class='row'><span>VRAM</span><b>{fmt_num(gpu.get('used_gb'), 'GB')} / {fmt_num(gpu.get('total_gb'), 'GB')} ({fmt_num(vram_pct, '%')})</b></div>{pct_bar(vram_pct)}</div>
|
| 143 |
-
<div class='panel wide'><div class='row'><span>RAM</span><b>{fmt_num(ram.get('used_gb'), 'GB')} / {fmt_num(ram.get('total_gb'), 'GB')} ({fmt_num(ram_pct, '%')})</b></div>{pct_bar(ram_pct)}</div>
|
| 144 |
-
</section>
|
| 145 |
-
<section class='panel full'>
|
| 146 |
-
<span>Latest training line</span>
|
| 147 |
-
<pre>{step_line}</pre>
|
| 148 |
-
</section>
|
| 149 |
-
<p class='updated'>Updated {now_utc()} from {API_BASE}</p>
|
| 150 |
"""
|
| 151 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 152 |
|
| 153 |
-
|
| 154 |
-
|
| 155 |
-
|
| 156 |
-
|
| 157 |
-
|
| 158 |
-
|
| 159 |
-
|
| 160 |
-
|
| 161 |
-
|
| 162 |
-
|
| 163 |
-
|
| 164 |
-
|
| 165 |
-
|
| 166 |
-
|
| 167 |
-
|
| 168 |
-
|
| 169 |
-
|
| 170 |
-
|
| 171 |
-
|
| 172 |
-
|
| 173 |
-
|
| 174 |
-
|
| 175 |
-
|
| 176 |
-
|
| 177 |
-
|
| 178 |
-
|
| 179 |
-
|
| 180 |
-
|
| 181 |
-
|
| 182 |
-
|
| 183 |
-
|
| 184 |
-
|
| 185 |
-
|
| 186 |
-
|
| 187 |
-
|
| 188 |
-
|
| 189 |
-
|
| 190 |
-
|
| 191 |
-
|
| 192 |
-
|
| 193 |
-
|
| 194 |
-
|
| 195 |
-
|
| 196 |
-
|
| 197 |
-
|
| 198 |
-
|
| 199 |
-
|
| 200 |
-
|
| 201 |
-
|
| 202 |
-
|
| 203 |
-
|
| 204 |
-
|
| 205 |
-
|
| 206 |
-
|
| 207 |
-
|
| 208 |
-
|
| 209 |
-
|
| 210 |
-
|
| 211 |
-
|
| 212 |
-
|
| 213 |
-
|
| 214 |
-
|
| 215 |
-
|
| 216 |
-
|
| 217 |
-
|
| 218 |
-
|
| 219 |
-
|
| 220 |
-
|
| 221 |
-
|
| 222 |
-
|
| 223 |
-
|
| 224 |
-
|
| 225 |
-
|
| 226 |
-
|
| 227 |
-
|
| 228 |
-
|
| 229 |
-
|
| 230 |
-
|
| 231 |
-
|
| 232 |
-
|
| 233 |
-
|
| 234 |
-
|
| 235 |
-
|
| 236 |
-
|
| 237 |
-
|
| 238 |
-
|
| 239 |
-
|
| 240 |
-
|
| 241 |
-
|
| 242 |
-
|
| 243 |
-
|
| 244 |
-
|
| 245 |
-
|
| 246 |
-
|
| 247 |
-
|
| 248 |
-
|
| 249 |
-
|
| 250 |
-
|
| 251 |
-
|
| 252 |
-
|
| 253 |
-
|
| 254 |
-
|
| 255 |
-
|
| 256 |
-
|
| 257 |
-
|
| 258 |
-
|
| 259 |
-
|
| 260 |
-
|
| 261 |
-
|
| 262 |
-
|
| 263 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 264 |
"""
|
| 265 |
|
| 266 |
|
| 267 |
-
|
| 268 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 269 |
with gr.Tabs():
|
| 270 |
-
|
| 271 |
-
|
| 272 |
-
|
| 273 |
-
|
| 274 |
-
|
| 275 |
-
|
| 276 |
-
|
| 277 |
-
|
| 278 |
-
|
| 279 |
-
|
| 280 |
-
|
| 281 |
-
|
| 282 |
-
|
| 283 |
-
|
| 284 |
-
|
| 285 |
-
|
| 286 |
-
|
| 287 |
-
|
| 288 |
-
|
| 289 |
-
|
| 290 |
-
|
| 291 |
-
|
| 292 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 293 |
|
| 294 |
|
| 295 |
if __name__ == "__main__":
|
| 296 |
-
|
|
|
|
| 1 |
+
"""SentinelBrain-14B MoE — Live Training Dashboard (HuggingFace Space).
|
| 2 |
+
|
| 3 |
+
Connects to the training server at sentinel.qubitpage.com and displays
|
| 4 |
+
real-time metrics: loss curves, throughput, VRAM, the novel Φ consciousness
|
| 5 |
+
metric, and architecture details. Refreshes every 30 seconds.
|
| 6 |
+
|
| 7 |
+
No model inference runs here — the 14.4B-param model is training on an
|
| 8 |
+
AMD Instinct MI300X and this Space is a live window into that process.
|
| 9 |
+
"""
|
| 10 |
from __future__ import annotations
|
| 11 |
|
|
|
|
|
|
|
|
|
|
| 12 |
import time
|
| 13 |
+
import traceback
|
| 14 |
from datetime import datetime, timezone
|
|
|
|
|
|
|
| 15 |
|
| 16 |
import gradio as gr
|
| 17 |
+
import httpx
|
| 18 |
+
import plotly.graph_objects as go
|
| 19 |
|
| 20 |
+
# ── Config ───────────────────────────────────────────────────────────────
|
| 21 |
+
API_BASE = "https://sentinel.qubitpage.com"
|
| 22 |
+
REFRESH_INTERVAL = 30 # seconds
|
| 23 |
+
MODEL_PARAMS = "14,400,000,000"
|
| 24 |
+
MODEL_NAME = "Sentinel Prime Frankenstein Edition"
|
| 25 |
+
HF_SPACE = "lablab-ai-amd-developer-hackathon/sentinel-prime-frankenstein-edition"
|
| 26 |
|
| 27 |
+
# ── API helpers ──────────────────────────────────────────────────────────
|
| 28 |
+
_client = httpx.Client(timeout=15, follow_redirects=True)
|
|
|
|
| 29 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
+
def _fetch(endpoint: str) -> dict:
|
| 32 |
+
"""Fetch JSON from the training server API."""
|
| 33 |
+
try:
|
| 34 |
+
r = _client.get(f"{API_BASE}{endpoint}")
|
| 35 |
+
r.raise_for_status()
|
| 36 |
+
return r.json()
|
| 37 |
+
except Exception as e:
|
| 38 |
+
return {"_error": str(e)}
|
| 39 |
|
| 40 |
|
| 41 |
+
def _fetch_text(endpoint: str) -> str:
|
| 42 |
+
try:
|
| 43 |
+
r = _client.get(f"{API_BASE}{endpoint}")
|
| 44 |
+
r.raise_for_status()
|
| 45 |
+
return r.text
|
| 46 |
+
except Exception as e:
|
| 47 |
+
return f"Cannot reach training server: {e}"
|
| 48 |
|
| 49 |
|
| 50 |
+
def _safe(val, fmt=".2f", fallback="—"):
|
| 51 |
+
"""Format a numeric value safely."""
|
| 52 |
+
if val is None:
|
| 53 |
+
return fallback
|
| 54 |
try:
|
| 55 |
+
return f"{float(val):{fmt}}"
|
| 56 |
+
except (ValueError, TypeError):
|
| 57 |
+
return fallback
|
|
|
|
|
|
|
| 58 |
|
| 59 |
|
| 60 |
+
# ── Formatters ───────────────────────────────────────────────────────────
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 61 |
|
| 62 |
+
def _format_tokens(n: int | float | None) -> str:
|
| 63 |
+
if n is None:
|
| 64 |
+
return "—"
|
| 65 |
+
n = int(n)
|
| 66 |
+
if n >= 1_000_000_000:
|
| 67 |
+
return f"{n / 1e9:.2f}B"
|
| 68 |
+
if n >= 1_000_000:
|
| 69 |
+
return f"{n / 1e6:.1f}M"
|
| 70 |
+
if n >= 1_000:
|
| 71 |
+
return f"{n / 1e3:.1f}K"
|
| 72 |
+
return str(n)
|
| 73 |
|
| 74 |
+
|
| 75 |
+
def _format_eta(hrs: float | None) -> str:
|
| 76 |
+
if hrs is None:
|
| 77 |
+
return "—"
|
| 78 |
+
h = int(hrs)
|
| 79 |
+
m = int((hrs - h) * 60)
|
| 80 |
+
return f"{h}h {m}m"
|
| 81 |
+
|
| 82 |
+
|
| 83 |
+
def _phi_bar(value: float | None) -> str:
|
| 84 |
+
"""Create a visual bar for phi value (0-1 range)."""
|
| 85 |
if value is None:
|
| 86 |
+
return "—"
|
| 87 |
+
v = max(0, min(1, float(value)))
|
| 88 |
+
filled = int(v * 20)
|
| 89 |
+
bar = "█" * filled + "░" * (20 - filled)
|
| 90 |
+
return f"`{bar}` {v:.4f}"
|
| 91 |
+
|
| 92 |
+
|
| 93 |
+
# ── Build live metrics display ───────────────────────────────────────────
|
| 94 |
+
|
| 95 |
+
def fetch_overview():
|
| 96 |
+
"""Fetch all metrics and return formatted display components."""
|
| 97 |
+
data = _fetch("/api/overview")
|
| 98 |
+
if "_error" in data:
|
| 99 |
+
error_msg = f"⚠️ **Cannot reach training server**: {data['_error']}\n\nThe server may be temporarily unavailable. Metrics will refresh automatically."
|
| 100 |
+
return error_msg, None, None, None, ""
|
| 101 |
+
|
| 102 |
+
t = data.get("training", {})
|
| 103 |
+
phi = t.get("phi", {})
|
| 104 |
+
model = t.get("model", {})
|
| 105 |
+
phase3 = t.get("phase3_dataset", {})
|
| 106 |
+
vram = data.get("vram", {})
|
| 107 |
+
ram = data.get("ram", {})
|
| 108 |
+
shards = data.get("shards", {})
|
| 109 |
+
|
| 110 |
+
# ── Training Status Card ─────────────────────────────────────────
|
| 111 |
+
phase = t.get("phase", "unknown")
|
| 112 |
+
phase_emoji = {"phase3_sft": "🟢", "training": "🟢", "warming": "🟡", "evaluating": "🔵", "idle": "⚪"}.get(phase, "⚫")
|
| 113 |
+
|
| 114 |
+
step = t.get("current_step", 0)
|
| 115 |
+
total_steps = t.get("batch_steps", 0)
|
| 116 |
+
progress = t.get("progress_pct", 0)
|
| 117 |
+
loss = t.get("train_loss")
|
| 118 |
+
val_loss = t.get("val_loss")
|
| 119 |
+
best_val = t.get("best_val")
|
| 120 |
+
tok_s = t.get("tok_per_sec")
|
| 121 |
+
eta = t.get("eta_hrs")
|
| 122 |
+
lr = t.get("lr")
|
| 123 |
+
gnorm = t.get("gnorm")
|
| 124 |
+
|
| 125 |
+
status_md = f"""## {phase_emoji} Training: **{phase.upper()}**
|
| 126 |
+
|
| 127 |
+
| Metric | Value |
|
| 128 |
+
|--------|-------|
|
| 129 |
+
| **Step** | {step:,} / {total_steps:,} ({_safe(progress, '.1f')}%) |
|
| 130 |
+
| **Training Loss** | {_safe(loss, '.4f')} |
|
| 131 |
+
| **Validation Loss** | {_safe(val_loss, '.4f')} |
|
| 132 |
+
| **Best Validation** | {_safe(best_val, '.4f')} |
|
| 133 |
+
| **Throughput** | {_safe(tok_s, ',.0f')} tok/s |
|
| 134 |
+
| **Learning Rate** | {_safe(lr, '.2e')} |
|
| 135 |
+
| **Gradient Norm** | {_safe(gnorm, '.3f')} |
|
| 136 |
+
| **ETA** | {_format_eta(eta)} |
|
| 137 |
+
| **Context length** | {phase3.get('seq_len') or model.get('seq_len', '—')} tokens |
|
| 138 |
+
| **Batch / grad accum** | {phase3.get('batch_size') or model.get('batch', '—')} / {phase3.get('grad_accum') or model.get('grad_accum', '—')} |
|
| 139 |
+
|
| 140 |
+
### Hardware
|
| 141 |
+
| Resource | Usage |
|
| 142 |
+
|----------|-------|
|
| 143 |
+
| **VRAM** | {_safe(vram.get('used_gb'), '.1f')} / {_safe(vram.get('total_gb'), '.1f')} GB ({_safe(vram.get('pct'), '.0f')}%) |
|
| 144 |
+
| **RAM** | {_safe(ram.get('used_gb'), '.1f')} / {_safe(ram.get('total_gb'), '.1f')} GB ({_safe(ram.get('pct'), '.0f')}%) |
|
| 145 |
+
| **GPU** | AMD Instinct MI300X (192 GB HBM3) |
|
| 146 |
+
|
| 147 |
+
### Dataset
|
| 148 |
+
| Stat | Value |
|
| 149 |
+
|------|-------|
|
| 150 |
+
| **Categories** | {shards.get('categories', '—')} |
|
| 151 |
+
| **Total tokens** | {_format_tokens(shards.get('tokens'))} |
|
| 152 |
+
| **Pretrain** | {_safe(shards.get('pretrain_tokens_b'), '.2f')}B tokens |
|
| 153 |
+
| **SFT** | {_safe(shards.get('sft_tokens_b'), '.3f')}B tokens |
|
| 154 |
+
| **6K production sequences** | {phase3.get('total_sequences', '—')} |
|
| 155 |
+
| **6K packing efficiency** | {_safe((phase3.get('packing_efficiency') or 0) * 100, '.1f')}% |
|
| 156 |
+
|
| 157 |
+
*Last updated: {datetime.now(timezone.utc).strftime('%H:%M:%S UTC')}*
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 158 |
"""
|
| 159 |
|
| 160 |
+
# ── Φ (Consciousness) Card ───────────────────────────────────────
|
| 161 |
+
phi_geo = phi.get("geometric")
|
| 162 |
+
phi_norm = phi.get("normalized")
|
| 163 |
+
phi_ema = phi.get("ema")
|
| 164 |
+
phi_trend = phi.get("trend", "—")
|
| 165 |
+
phi_arrow = phi.get("trend_arrow", "")
|
| 166 |
+
phi_slope = phi.get("trend_slope")
|
| 167 |
+
phi_conf = phi.get("trend_confidence", "—")
|
| 168 |
+
|
| 169 |
+
phi_md = f"""## 🧠 Φ Consciousness Metric
|
| 170 |
+
|
| 171 |
+
The **Φ (phi)** metric measures integrated information flow across the model's
|
| 172 |
+
layers during training — inspired by Integrated Information Theory (IIT).
|
| 173 |
+
Higher Φ suggests more complex, interconnected representations emerging.
|
| 174 |
+
|
| 175 |
+
| Metric | Value |
|
| 176 |
+
|--------|-------|
|
| 177 |
+
| **Φ Geometric** | {_phi_bar(phi_geo)} |
|
| 178 |
+
| **Φ Normalized** | {_phi_bar(phi_norm)} |
|
| 179 |
+
| **Φ EMA** | {_phi_bar(phi_ema)} |
|
| 180 |
+
| **Trend** | {phi_arrow} {phi_trend} |
|
| 181 |
+
| **Slope** | {_safe(phi_slope, '.6f')} |
|
| 182 |
+
| **Confidence** | {phi_conf} |
|
| 183 |
+
|
| 184 |
+
### What does Φ mean?
|
| 185 |
+
|
| 186 |
+
- **Φ < 0.1** — Early training, layers acting independently
|
| 187 |
+
- **Φ 0.1–0.3** — Information beginning to integrate across layers
|
| 188 |
+
- **Φ 0.3–0.5** — Strong cross-layer information flow emerging
|
| 189 |
+
- **Φ > 0.5** — High integration — complex representations forming
|
| 190 |
+
- **Φ > 0.7** — Exceptional — approaching theoretical maximum for this architecture
|
| 191 |
+
"""
|
| 192 |
|
| 193 |
+
# ── Phi History Chart ────────────────────────────────────────────
|
| 194 |
+
phi_chart = None
|
| 195 |
+
phi_recent = data.get("phi_recent", [])
|
| 196 |
+
if phi_recent and len(phi_recent) > 2:
|
| 197 |
+
steps_list = [p.get("step", i) for i, p in enumerate(phi_recent)]
|
| 198 |
+
geo_list = [p.get("geometric") for p in phi_recent]
|
| 199 |
+
norm_list = [p.get("normalized") for p in phi_recent]
|
| 200 |
+
ema_list = [p.get("ema") for p in phi_recent]
|
| 201 |
+
|
| 202 |
+
fig = go.Figure()
|
| 203 |
+
if any(v is not None for v in geo_list):
|
| 204 |
+
fig.add_trace(go.Scatter(
|
| 205 |
+
x=steps_list, y=geo_list, mode="lines",
|
| 206 |
+
name="Φ Geometric", line=dict(color="#8b5cf6", width=2),
|
| 207 |
+
))
|
| 208 |
+
if any(v is not None for v in norm_list):
|
| 209 |
+
fig.add_trace(go.Scatter(
|
| 210 |
+
x=steps_list, y=norm_list, mode="lines",
|
| 211 |
+
name="Φ Normalized", line=dict(color="#06b6d4", width=2),
|
| 212 |
+
))
|
| 213 |
+
if any(v is not None for v in ema_list):
|
| 214 |
+
fig.add_trace(go.Scatter(
|
| 215 |
+
x=steps_list, y=ema_list, mode="lines",
|
| 216 |
+
name="Φ EMA", line=dict(color="#f59e0b", width=2, dash="dot"),
|
| 217 |
+
))
|
| 218 |
+
fig.update_layout(
|
| 219 |
+
title="Φ Consciousness Metric Over Training",
|
| 220 |
+
xaxis_title="Step",
|
| 221 |
+
yaxis_title="Φ Value",
|
| 222 |
+
template="plotly_white",
|
| 223 |
+
height=350,
|
| 224 |
+
margin=dict(l=50, r=20, t=50, b=40),
|
| 225 |
+
legend=dict(orientation="h", yanchor="bottom", y=1.02, xanchor="right", x=1),
|
| 226 |
+
plot_bgcolor="#fafafa",
|
| 227 |
+
paper_bgcolor="#fafafa",
|
| 228 |
+
font=dict(color="#1e293b"),
|
| 229 |
+
)
|
| 230 |
+
phi_chart = fig
|
| 231 |
+
|
| 232 |
+
# ── Loss Chart from recent history ───────────────────────────────
|
| 233 |
+
loss_chart = None
|
| 234 |
+
history = t.get("recent_history", [])
|
| 235 |
+
if history and len(history) > 1:
|
| 236 |
+
batch_nums = list(range(len(history)))
|
| 237 |
+
train_losses = [h.get("loss_end") or h.get("train_loss") for h in history]
|
| 238 |
+
val_losses = [h.get("val_end") or h.get("val_loss") for h in history]
|
| 239 |
+
|
| 240 |
+
fig2 = go.Figure()
|
| 241 |
+
if any(v is not None for v in train_losses):
|
| 242 |
+
fig2.add_trace(go.Scatter(
|
| 243 |
+
x=batch_nums, y=train_losses, mode="lines+markers",
|
| 244 |
+
name="Train Loss", line=dict(color="#ef4444", width=2),
|
| 245 |
+
))
|
| 246 |
+
if any(v is not None for v in val_losses):
|
| 247 |
+
fig2.add_trace(go.Scatter(
|
| 248 |
+
x=batch_nums, y=val_losses, mode="lines+markers",
|
| 249 |
+
name="Val Loss", line=dict(color="#22c55e", width=2),
|
| 250 |
+
))
|
| 251 |
+
fig2.update_layout(
|
| 252 |
+
title="Loss Over Recent Training Batches",
|
| 253 |
+
xaxis_title="Batch",
|
| 254 |
+
yaxis_title="Loss",
|
| 255 |
+
template="plotly_white",
|
| 256 |
+
height=350,
|
| 257 |
+
margin=dict(l=50, r=20, t=50, b=40),
|
| 258 |
+
legend=dict(orientation="h", yanchor="bottom", y=1.02, xanchor="right", x=1),
|
| 259 |
+
plot_bgcolor="#fafafa",
|
| 260 |
+
paper_bgcolor="#fafafa",
|
| 261 |
+
font=dict(color="#1e293b"),
|
| 262 |
+
)
|
| 263 |
+
loss_chart = fig2
|
| 264 |
+
|
| 265 |
+
# ── Checkpoints info ─────────────────────────────────────────────
|
| 266 |
+
ckpts = data.get("checkpoints", [])
|
| 267 |
+
ckpt_md = ""
|
| 268 |
+
if ckpts:
|
| 269 |
+
ckpt_md = "\n### Saved Checkpoints\n\n| Checkpoint | Val Loss | Tokens |\n|-----------|----------|--------|\n"
|
| 270 |
+
for c in ckpts[-5:]:
|
| 271 |
+
name = c.get("name", "—")
|
| 272 |
+
vloss = _safe(c.get("val_loss"), ".4f")
|
| 273 |
+
toks = _format_tokens(c.get("tokens_trained"))
|
| 274 |
+
ckpt_md += f"| {name} | {vloss} | {toks} |\n"
|
| 275 |
+
|
| 276 |
+
return status_md + ckpt_md, phi_md, phi_chart, loss_chart, ""
|
| 277 |
+
|
| 278 |
+
|
| 279 |
+
def fetch_live_log():
|
| 280 |
+
text = _fetch_text("/api/logs/phase3_production_train_6k?n=120")
|
| 281 |
+
text = text.replace("```", "'''")
|
| 282 |
+
return f"```text\n{text}\n```"
|
| 283 |
+
|
| 284 |
+
|
| 285 |
+
# ── Architecture diagram ─────────────────────────────────────────────────
|
| 286 |
+
|
| 287 |
+
ARCHITECTURE_MD = f"""## 🏗️ SentinelBrain-14B MoE Architecture
|
| 288 |
+
|
| 289 |
+
**{MODEL_PARAMS} parameters** — trained from scratch, no base model.
|
| 290 |
+
|
| 291 |
+
```
|
| 292 |
+
┌─────────────────────────────────────────────────────┐
|
| 293 |
+
│ Input Tokens │
|
| 294 |
+
│ tiktoken cl100k_base (100,277) │
|
| 295 |
+
└───────────────────────┬─────────────────────────────┘
|
| 296 |
+
│
|
| 297 |
+
▼
|
| 298 |
+
┌─────────────────────────────────────────────────────┐
|
| 299 |
+
│ Token Embedding (d=4096) │
|
| 300 |
+
│ + RoPE Positional Encoding │
|
| 301 |
+
│ θ=500,000 (128K capable) │
|
| 302 |
+
└───────────────────────┬─────────────────────────────┘
|
| 303 |
+
│
|
| 304 |
+
┌─────────▼─────────┐
|
| 305 |
+
│ × 24 Layers │
|
| 306 |
+
│ │
|
| 307 |
+
│ ┌─────────────┐ │
|
| 308 |
+
│ │ RMSNorm │ │
|
| 309 |
+
│ └──────┬──────┘ │
|
| 310 |
+
│ ▼ │
|
| 311 |
+
│ ┌─────────────┐ │
|
| 312 |
+
│ │ GQA │ │
|
| 313 |
+
│ │ 32Q → 8KV │ │
|
| 314 |
+
│ │ (4× save) │ │
|
| 315 |
+
│ └──────┬──────┘ │
|
| 316 |
+
│ ▼ │
|
| 317 |
+
│ ┌─────────────┐ │
|
| 318 |
+
│ │ RMSNorm │ │
|
| 319 |
+
│ └──────┬──────┘ │
|
| 320 |
+
│ ▼ │
|
| 321 |
+
│ ┌─────────────┐ │
|
| 322 |
+
│ │ MoE Block │ │
|
| 323 |
+
│ │ 4 experts │ │
|
| 324 |
+
│ │ top-2 gate │ │
|
| 325 |
+
│ │ SwiGLU FFN │ │
|
| 326 |
+
│ │ d_ff=11008 │ │
|
| 327 |
+
│ └─────────────┘ │
|
| 328 |
+
│ │
|
| 329 |
+
└─────────┬─────────┘
|
| 330 |
+
│
|
| 331 |
+
▼
|
| 332 |
+
┌─────────────────────────────────────────────────────┐
|
| 333 |
+
│ Final RMSNorm → LM Head │
|
| 334 |
+
│ (100,277 logits) │
|
| 335 |
+
└─────────────────────────────────────────────────────┘
|
| 336 |
+
```
|
| 337 |
+
|
| 338 |
+
### Key Design Decisions
|
| 339 |
+
|
| 340 |
+
| Choice | Why |
|
| 341 |
+
|--------|-----|
|
| 342 |
+
| **MoE (4 experts, top-2)** | 14.4B total params with top-2 routing — efficient inference |
|
| 343 |
+
| **GQA (32→8)** | 4× KV-cache reduction enables longer context at lower VRAM |
|
| 344 |
+
| **SwiGLU** | Better gradient flow than ReLU/GELU — `SiLU(xW₁) ⊙ xW₃` |
|
| 345 |
+
| **RoPE θ=500K** | Trained at 4K context, extrapolates to 128K with YaRN |
|
| 346 |
+
| **cl100k_base** | 100K vocab — excellent multilingual + code coverage |
|
| 347 |
+
| **From scratch** | No fine-tuning debt, clean loss landscape, full architectural control |
|
| 348 |
+
|
| 349 |
+
### Training Configuration
|
| 350 |
+
|
| 351 |
+
| Parameter | Value |
|
| 352 |
+
|-----------|-------|
|
| 353 |
+
| Batch size | 1 (per device) |
|
| 354 |
+
| Gradient accumulation | 16 steps |
|
| 355 |
+
| Effective batch | 16 × 6144 = 98K tokens |
|
| 356 |
+
| Optimizer | AdamW (bf16 forward, fp32 states) |
|
| 357 |
+
| Precision | bf16 mixed precision |
|
| 358 |
+
| Gradient checkpointing | Enabled |
|
| 359 |
+
| Attention | SDPA (Scaled Dot-Product Attention) |
|
| 360 |
+
| Max LR | 2e-5 (cosine decay to 1e-6) |
|
| 361 |
+
|
| 362 |
+
### Why AMD MI300X?
|
| 363 |
+
|
| 364 |
+
- **192 GB HBM3** — fits the full 14.4B model + optimizer states + gradients in a single GPU
|
| 365 |
+
- **No model parallelism needed** — simpler training code, no communication overhead
|
| 366 |
+
- **ROCm 7.0** — mature PyTorch support with hipBLASLt for fast GEMM operations
|
| 367 |
+
- **5.3 TB/s memory bandwidth** — keeps the MoE experts fed during routing
|
| 368 |
"""
|
| 369 |
|
| 370 |
|
| 371 |
+
# ── Custom CSS ───────────────────────────────────────────────────────────
|
| 372 |
+
|
| 373 |
+
CUSTOM_CSS = """
|
| 374 |
+
/* ── Light mode (default) ── */
|
| 375 |
+
.prose, .prose *, [class*="markdown"], [class*="markdown"] * {
|
| 376 |
+
color: #1e293b !important;
|
| 377 |
+
}
|
| 378 |
+
.prose strong, .prose h1, .prose h2, .prose h3 {
|
| 379 |
+
color: #0f172a !important;
|
| 380 |
+
font-weight: 700 !important;
|
| 381 |
+
}
|
| 382 |
+
.prose table { border-collapse: collapse; width: 100%; }
|
| 383 |
+
.prose th, .prose td { padding: 8px 12px; border: 1px solid #cbd5e1; color: #1e293b !important; }
|
| 384 |
+
.prose th { background: #f1f5f9; font-weight: 600; color: #0f172a !important; }
|
| 385 |
+
.prose td { background: #ffffff; }
|
| 386 |
+
.prose code {
|
| 387 |
+
background: #f1f5f9;
|
| 388 |
+
color: #7c3aed !important;
|
| 389 |
+
padding: 2px 6px;
|
| 390 |
+
border-radius: 4px;
|
| 391 |
+
font-size: 0.9em;
|
| 392 |
+
}
|
| 393 |
+
.prose pre {
|
| 394 |
+
background: #1e293b !important;
|
| 395 |
+
color: #e2e8f0 !important;
|
| 396 |
+
padding: 16px;
|
| 397 |
+
border-radius: 8px;
|
| 398 |
+
overflow-x: auto;
|
| 399 |
+
font-size: 0.8em;
|
| 400 |
+
line-height: 1.4;
|
| 401 |
+
}
|
| 402 |
+
.prose pre code {
|
| 403 |
+
background: transparent;
|
| 404 |
+
color: #e2e8f0 !important;
|
| 405 |
+
}
|
| 406 |
+
.prose a { color: #7c3aed !important; }
|
| 407 |
+
.prose em { color: #475569 !important; }
|
| 408 |
+
.prose li { color: #1e293b !important; }
|
| 409 |
+
|
| 410 |
+
/* ── Dark mode overrides ── */
|
| 411 |
+
.dark .prose, .dark .prose *, .dark [class*="markdown"], .dark [class*="markdown"] * {
|
| 412 |
+
color: #e2e8f0 !important;
|
| 413 |
+
}
|
| 414 |
+
.dark .prose strong, .dark .prose h1, .dark .prose h2, .dark .prose h3 {
|
| 415 |
+
color: #f8fafc !important;
|
| 416 |
+
}
|
| 417 |
+
.dark .prose th, .dark .prose td { border-color: #475569; color: #e2e8f0 !important; }
|
| 418 |
+
.dark .prose th { background: #1e293b; color: #f8fafc !important; }
|
| 419 |
+
.dark .prose td { background: #0f172a; }
|
| 420 |
+
.dark .prose code { background: #1e293b; color: #a78bfa !important; }
|
| 421 |
+
.dark .prose pre { background: #0f172a !important; color: #e2e8f0 !important; }
|
| 422 |
+
.dark .prose pre code { color: #e2e8f0 !important; }
|
| 423 |
+
.dark .prose a { color: #a78bfa !important; }
|
| 424 |
+
.dark .prose em { color: #94a3b8 !important; }
|
| 425 |
+
.dark .prose li { color: #e2e8f0 !important; }
|
| 426 |
+
|
| 427 |
+
/* ── Tab styling ── */
|
| 428 |
+
.tab-nav button {
|
| 429 |
+
font-weight: 600 !important;
|
| 430 |
+
font-size: 1rem !important;
|
| 431 |
+
}
|
| 432 |
+
.tab-nav button.selected {
|
| 433 |
+
border-bottom: 3px solid #7c3aed !important;
|
| 434 |
+
}
|
| 435 |
+
"""
|
| 436 |
+
|
| 437 |
+
|
| 438 |
+
# ── Gradio App ───────────────────────────────────────────────────────────
|
| 439 |
+
|
| 440 |
+
with gr.Blocks(
|
| 441 |
+
title=f"{MODEL_NAME} — Live Training",
|
| 442 |
+
css=CUSTOM_CSS,
|
| 443 |
+
theme=gr.themes.Soft(
|
| 444 |
+
primary_hue="violet",
|
| 445 |
+
secondary_hue="blue",
|
| 446 |
+
neutral_hue="slate",
|
| 447 |
+
),
|
| 448 |
+
) as app:
|
| 449 |
+
|
| 450 |
+
gr.Markdown(
|
| 451 |
+
f"# 🧠 {MODEL_NAME}\n"
|
| 452 |
+
"### 14.4B Mixture-of-Experts · 6K Production SFT · Training Live on AMD MI300X\n\n"
|
| 453 |
+
"*This Space connects to a real training server and shows live metrics. "
|
| 454 |
+
"No inference runs here — the model is actively training on an AMD Instinct MI300X (192 GB HBM3).*\n\n"
|
| 455 |
+
"🔗 [Whitepaper](https://sentinel.qubitpage.com/whitepaper) · "
|
| 456 |
+
"[Model](https://huggingface.co/lablab-ai-amd-developer-hackathon/SentinelBrain-14B-MoE-v0.1) · "
|
| 457 |
+
"[Full Dashboard](https://sentinel.qubitpage.com)"
|
| 458 |
+
)
|
| 459 |
+
|
| 460 |
with gr.Tabs():
|
| 461 |
+
# ── Tab 1: Live Training ─────────────────────────────────────
|
| 462 |
+
with gr.TabItem("📊 Live Training", id="training"):
|
| 463 |
+
refresh_btn = gr.Button("🔄 Refresh Metrics", variant="primary", size="lg")
|
| 464 |
+
error_box = gr.Markdown(visible=False)
|
| 465 |
+
|
| 466 |
+
with gr.Row():
|
| 467 |
+
with gr.Column(scale=1):
|
| 468 |
+
status_output = gr.Markdown(label="Training Status")
|
| 469 |
+
with gr.Column(scale=1):
|
| 470 |
+
phi_output = gr.Markdown(label="Φ Metric")
|
| 471 |
+
|
| 472 |
+
with gr.Row():
|
| 473 |
+
with gr.Column(scale=1):
|
| 474 |
+
phi_plot = gr.Plot(label="Φ History")
|
| 475 |
+
with gr.Column(scale=1):
|
| 476 |
+
loss_plot = gr.Plot(label="Loss Curve")
|
| 477 |
+
|
| 478 |
+
with gr.TabItem("🧾 Live 6K Log", id="live_log"):
|
| 479 |
+
log_refresh_btn = gr.Button("🔄 Refresh Live Log", variant="primary", size="lg")
|
| 480 |
+
live_log_output = gr.Markdown(label="Current 6K training log")
|
| 481 |
+
|
| 482 |
+
# ── Tab 2: Architecture ──────────────────────────────────────
|
| 483 |
+
with gr.TabItem("🏗️ Architecture", id="architecture"):
|
| 484 |
+
gr.Markdown(ARCHITECTURE_MD)
|
| 485 |
+
|
| 486 |
+
# ── Tab 3: About ─────────────────────────────────────────────
|
| 487 |
+
with gr.TabItem("ℹ️ About", id="about"):
|
| 488 |
+
gr.Markdown("""## About SentinelBrain
|
| 489 |
+
|
| 490 |
+
**SentinelBrain** is a 14.8-billion parameter Mixture-of-Experts language model
|
| 491 |
+
being trained **entirely from scratch** — no fine-tuning, no base model, no shortcuts.
|
| 492 |
+
|
| 493 |
+
### What makes it different?
|
| 494 |
+
|
| 495 |
+
1. **Custom architecture** — designed and implemented from the ground up, not a
|
| 496 |
+
fork of LLaMA or Mistral
|
| 497 |
+
2. **Φ consciousness metric** — we track integrated information flow across layers
|
| 498 |
+
as a proxy for emergent understanding, inspired by Giulio Tononi's IIT
|
| 499 |
+
3. **AMD-native** — developed and trained on AMD Instinct MI300X via ROCm,
|
| 500 |
+
proving that cutting-edge AI research doesn't require NVIDIA
|
| 501 |
+
4. **126-category curriculum** — from mathematics and code to philosophy and
|
| 502 |
+
creative writing, with carefully balanced data proportions
|
| 503 |
+
|
| 504 |
+
### The Φ metric explained
|
| 505 |
+
|
| 506 |
+
Traditional training metrics (loss, perplexity) tell you *how well* the model
|
| 507 |
+
predicts the next token. **Φ** tells you *how* it's doing it — whether
|
| 508 |
+
information is being integrated across layers in complex ways, or whether
|
| 509 |
+
layers are operating independently.
|
| 510 |
+
|
| 511 |
+
We compute Φ by analyzing gradient covariance matrices across layer boundaries:
|
| 512 |
+
|
| 513 |
+
$$\\Phi = \\left(\\prod_{i=1}^{L-1} \\frac{\\text{MI}(\\nabla_{\\theta_i}, \\nabla_{\\theta_{i+1}})}{H(\\nabla_{\\theta_i})}\\right)^{1/(L-1)}$$
|
| 514 |
+
|
| 515 |
+
Where MI is mutual information between adjacent layer gradients and H is entropy.
|
| 516 |
+
A rising Φ during training suggests the model is developing more interconnected
|
| 517 |
+
internal representations — a necessary (though not sufficient) condition for
|
| 518 |
+
what we might call "understanding."
|
| 519 |
+
|
| 520 |
+
### Competition entry
|
| 521 |
+
|
| 522 |
+
This is an entry in the **lablab.ai AMD Developer Hackathon**, demonstrating
|
| 523 |
+
that you can train a frontier-scale model on a single AMD MI300X GPU with
|
| 524 |
+
192 GB HBM3 — no multi-node cluster required.
|
| 525 |
+
|
| 526 |
+
### Hardware: AMD Instinct MI300X
|
| 527 |
+
|
| 528 |
+
| Spec | Value |
|
| 529 |
+
|------|-------|
|
| 530 |
+
| VRAM | 192 GB HBM3 |
|
| 531 |
+
| Memory bandwidth | 5.3 TB/s |
|
| 532 |
+
| Compute (bf16) | 1.3 PFLOPS |
|
| 533 |
+
| Architecture | CDNA 3 |
|
| 534 |
+
| Process | 5nm / 6nm chiplet |
|
| 535 |
+
| TDP | 750W |
|
| 536 |
+
|
| 537 |
+
The MI300X's 192 GB of unified HBM3 memory allows us to fit the entire 14.4B
|
| 538 |
+
model, optimizer states (fp32), gradients, and activations on a single GPU —
|
| 539 |
+
eliminating the need for model parallelism and its associated communication overhead.
|
| 540 |
+
""")
|
| 541 |
+
|
| 542 |
+
# ── Footer ───────────────────────────────────────────────────────
|
| 543 |
+
gr.Markdown(
|
| 544 |
+
"---\n"
|
| 545 |
+
f"**Model:** {MODEL_NAME} ({MODEL_PARAMS} params) · "
|
| 546 |
+
"**Hardware:** AMD Instinct MI300X (192 GB HBM3, ROCm 7.0) · "
|
| 547 |
+
"**Dataset:** current 6K SFT has 45,578 packed sequences and 243.7M effective tokens\n\n"
|
| 548 |
+
"*Built for the lablab.ai AMD Developer Hackathon · Apache 2.0*"
|
| 549 |
+
)
|
| 550 |
+
|
| 551 |
+
# ── Event handlers ───────────────────────────────────────────────
|
| 552 |
+
refresh_btn.click(
|
| 553 |
+
fn=fetch_overview,
|
| 554 |
+
outputs=[status_output, phi_output, phi_plot, loss_plot, error_box],
|
| 555 |
+
)
|
| 556 |
+
log_refresh_btn.click(
|
| 557 |
+
fn=fetch_live_log,
|
| 558 |
+
outputs=[live_log_output],
|
| 559 |
+
)
|
| 560 |
+
|
| 561 |
+
# Auto-load on start
|
| 562 |
+
app.load(
|
| 563 |
+
fn=fetch_overview,
|
| 564 |
+
outputs=[status_output, phi_output, phi_plot, loss_plot, error_box],
|
| 565 |
+
)
|
| 566 |
+
app.load(
|
| 567 |
+
fn=fetch_live_log,
|
| 568 |
+
outputs=[live_log_output],
|
| 569 |
+
)
|
| 570 |
|
| 571 |
|
| 572 |
if __name__ == "__main__":
|
| 573 |
+
app.launch(server_name="0.0.0.0", server_port=7860, show_api=False)
|
requirements.txt
CHANGED
|
@@ -1,2 +1,6 @@
|
|
| 1 |
-
gradio>=5.0,<6
|
| 2 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
gradio>=5.0,<6
|
| 2 |
+
httpx>=0.27,<1
|
| 3 |
+
plotly>=5.18
|
| 4 |
+
huggingface_hub>=0.25,<0.27
|
| 5 |
+
pydantic>=2.6,<2.11
|
| 6 |
+
audioop-lts; python_version >= "3.13"
|