---
title: "SentinelBrain-14B MoE — Live Training Dashboard"
emoji: 🧠
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: "5.0.0"
app_file: app.py
pinned: true
license: apache-2.0
short_description: "14.4B MoE from scratch — training live on AMD MI300X"
tags:
  - sentinelbrain
  - mixture-of-experts
  - amd
  - mi300x
  - rocm
  - training-dashboard
  - from-scratch
  - moe
  - phi-metric
  - consciousness
  - accessibility
  - aide
---

# 🧠 SentinelBrain-14B MoE — Live Training Dashboard

> **14.4 billion parameters · 4 experts × top-2 routing · Trained entirely from scratch · Live on AMD Instinct MI300X**

This Space is a real-time window into an actively training large language model.
No inference runs here — the 14.4B parameter MoE model is training on an AMD
Instinct MI300X (192 GB HBM3) and this dashboard displays live metrics.

---

## 🔥 What's Happening Now

**Phase 3 Production SFT** is running:
- 45,578 packed sequences × 6,144 tokens each
- 126-category curriculum (code, math, science, medical, legal, creative, multilingual)
- Gradient accumulation 32 → effective batch of **196,608 tokens**
- Training on a single AMD MI300X — no distributed computing needed

## 📊 Architecture at a Glance

| Component | Specification |
|-----------|--------------|
| **Parameters** | 14,400,000,000 (14.4B) |
| **Type** | Mixture-of-Experts (MoE) |
| **Experts** | 4 per layer, top-2 routing (~8B active) |
| **Layers** | 24 transformer blocks |
| **Attention** | GQA: 32 query → 8 KV heads (4× memory saving) |
| **FFN** | SwiGLU, d_ff=11,008 per expert |
| **Context** | 6,144 tokens (training), 128K capable via RoPE θ=500K |
| **Tokenizer** | tiktoken cl100k_base (100,277 vocab) |
| **Precision** | bf16 mixed precision |
| **GPU** | AMD Instinct MI300X (192 GB HBM3, 5.3 TB/s) |
| **Framework** | PyTorch 2.10 + ROCm 7.0 |

## 🔀 Expert Routing

The MoE routing uses **token-choice** with top-2 selection. Expert usage is
remarkably stable across training:

```
Expert 0: ████████████████░░░░ 32%  (general reasoning)
Expert 1: █████████░░░░░░░░░░░ 18%  (specialized)
Expert 2: ███████████████░░░░░ 31%  (general reasoning)
Expert 3: █████████░░░░░░░░░░░ 18%  (specialized)
```

This distribution matches the pretrained initialization — no expert collapse,
natural specialization preserved through SFT.

## 🧠 The Φ Consciousness Metric

We track **Φ (phi)** — an integrated information metric inspired by Giulio
Tononi's IIT. It measures how information flows between layers during training:

$$\Phi = \left(\prod_{i=1}^{L-1} \frac{\text{MI}(\nabla_{\theta_i}, \nabla_{\theta_{i+1}})}{H(\nabla_{\theta_i})}\right)^{1/(L-1)}$$

Rising Φ indicates the model is developing interconnected representations
rather than operating as independent layers.

## 📖 The Build Story

### From Random Noise to Intelligence

1. **Architecture Design** — Custom MoE with GQA, SwiGLU, RoPE. No LLaMA/Mistral fork.
2. **Phase 1: Pretraining** — Billions of tokens across 126 categories on MI300X
3. **Phase 2: Frankenstein Fusion** — Novel checkpoint merging technique combining best experts from different training stages
4. **Phase 3: Production SFT** — 6K context fine-tuning with curriculum weighting (**LIVE NOW**)

### Why "Frankenstein"?

During pretraining, different checkpoints excelled at different tasks. We
developed a fusion technique that combines the best expert from each checkpoint
into a single model — like assembling the best parts into one creation.

## 🌐 Qubitpage AIDE (Coming Soon)

**AIDE** (Accessibility Integrated Development Environment) — a VS Code fork
designed for developers with disabilities:

- **Sign Language Input** — Webcam → MediaPipe → ASL/BSL → code commands
- **Vocal Commands** — Whisper → intent → code actions
- **Neural Interface** — BCI → cursor/selection control
- **AI Dictation** — SentinelBrain → natural language to code

SentinelBrain will power AIDE's code intelligence backend.

## 🏆 Competition Entry

This is an entry in the **lablab.ai AMD Developer Hackathon**, demonstrating:

1. **Single-GPU frontier training** — 14.4B params on ONE MI300X
2. **AMD ROCm maturity** — production training without NVIDIA
3. **Architectural innovation** — custom MoE with consciousness metrics
4. **Real-world application** — powering an accessibility IDE

## Links

- 📄 [Whitepaper](https://sentinel.qubitpage.com/whitepaper)
- 📊 [Full Dashboard](https://sentinel.qubitpage.com)
- 🤗 [Model Weights](https://huggingface.co/lablab-ai-amd-developer-hackathon/SentinelBrain-14B-MoE-v0.1)
- 🌐 [Qubitpage](https://github.com/qubitpage)

## License

Apache 2.0 — Model weights, training code, and this Space are fully open.

---

*Built by Qubitpage for the lablab.ai AMD Developer Hackathon*