A newer version of the Gradio SDK is available: 6.15.0
title: SentinelBrain-14B MoE — Live Training Dashboard
emoji: 🧠
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: 5.0.0
app_file: app.py
pinned: true
license: apache-2.0
short_description: 14.4B MoE from scratch — training live on AMD MI300X
tags:
- sentinelbrain
- mixture-of-experts
- amd
- mi300x
- rocm
- training-dashboard
- from-scratch
- moe
- phi-metric
- consciousness
- accessibility
- aide
🧠 SentinelBrain-14B MoE — Live Training Dashboard
14.4 billion parameters · 4 experts × top-2 routing · Trained entirely from scratch · Live on AMD Instinct MI300X
This Space is a real-time window into an actively training large language model. No inference runs here — the 14.4B parameter MoE model is training on an AMD Instinct MI300X (192 GB HBM3) and this dashboard displays live metrics.
🔥 What's Happening Now
Phase 3 Production SFT is running:
- 45,578 packed sequences × 6,144 tokens each
- 126-category curriculum (code, math, science, medical, legal, creative, multilingual)
- Gradient accumulation 32 → effective batch of 196,608 tokens
- Training on a single AMD MI300X — no distributed computing needed
📊 Architecture at a Glance
| Component | Specification |
|---|---|
| Parameters | 14,400,000,000 (14.4B) |
| Type | Mixture-of-Experts (MoE) |
| Experts | 4 per layer, top-2 routing (~8B active) |
| Layers | 24 transformer blocks |
| Attention | GQA: 32 query → 8 KV heads (4× memory saving) |
| FFN | SwiGLU, d_ff=11,008 per expert |
| Context | 6,144 tokens (training), 128K capable via RoPE θ=500K |
| Tokenizer | tiktoken cl100k_base (100,277 vocab) |
| Precision | bf16 mixed precision |
| GPU | AMD Instinct MI300X (192 GB HBM3, 5.3 TB/s) |
| Framework | PyTorch 2.10 + ROCm 7.0 |
🔀 Expert Routing
The MoE routing uses token-choice with top-2 selection. Expert usage is remarkably stable across training:
Expert 0: ████████████████░░░░ 32% (general reasoning)
Expert 1: █████████░░░░░░░░░░░ 18% (specialized)
Expert 2: ███████████████░░░░░ 31% (general reasoning)
Expert 3: █████████░░░░░░░░░░░ 18% (specialized)
This distribution matches the pretrained initialization — no expert collapse, natural specialization preserved through SFT.
🧠 The Φ Consciousness Metric
We track Φ (phi) — an integrated information metric inspired by Giulio Tononi's IIT. It measures how information flows between layers during training:
Rising Φ indicates the model is developing interconnected representations rather than operating as independent layers.
📖 The Build Story
From Random Noise to Intelligence
- Architecture Design — Custom MoE with GQA, SwiGLU, RoPE. No LLaMA/Mistral fork.
- Phase 1: Pretraining — Billions of tokens across 126 categories on MI300X
- Phase 2: Frankenstein Fusion — Novel checkpoint merging technique combining best experts from different training stages
- Phase 3: Production SFT — 6K context fine-tuning with curriculum weighting (LIVE NOW)
Why "Frankenstein"?
During pretraining, different checkpoints excelled at different tasks. We developed a fusion technique that combines the best expert from each checkpoint into a single model — like assembling the best parts into one creation.
🌐 Qubitpage AIDE (Coming Soon)
AIDE (Accessibility Integrated Development Environment) — a VS Code fork designed for developers with disabilities:
- Sign Language Input — Webcam → MediaPipe → ASL/BSL → code commands
- Vocal Commands — Whisper → intent → code actions
- Neural Interface — BCI → cursor/selection control
- AI Dictation — SentinelBrain → natural language to code
SentinelBrain will power AIDE's code intelligence backend.
🏆 Competition Entry
This is an entry in the lablab.ai AMD Developer Hackathon, demonstrating:
- Single-GPU frontier training — 14.4B params on ONE MI300X
- AMD ROCm maturity — production training without NVIDIA
- Architectural innovation — custom MoE with consciousness metrics
- Real-world application — powering an accessibility IDE
Links
License
Apache 2.0 — Model weights, training code, and this Space are fully open.
Built by Qubitpage for the lablab.ai AMD Developer Hackathon