qubitpage's picture
v2.0: Competition-grade dashboard with expert routing, AIDE preview, full history
c9b45ec

A newer version of the Gradio SDK is available: 6.15.0

Upgrade
metadata
title: SentinelBrain-14B MoE  Live Training Dashboard
emoji: 🧠
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: 5.0.0
app_file: app.py
pinned: true
license: apache-2.0
short_description: 14.4B MoE from scratch  training live on AMD MI300X
tags:
  - sentinelbrain
  - mixture-of-experts
  - amd
  - mi300x
  - rocm
  - training-dashboard
  - from-scratch
  - moe
  - phi-metric
  - consciousness
  - accessibility
  - aide

🧠 SentinelBrain-14B MoE — Live Training Dashboard

14.4 billion parameters · 4 experts × top-2 routing · Trained entirely from scratch · Live on AMD Instinct MI300X

This Space is a real-time window into an actively training large language model. No inference runs here — the 14.4B parameter MoE model is training on an AMD Instinct MI300X (192 GB HBM3) and this dashboard displays live metrics.


🔥 What's Happening Now

Phase 3 Production SFT is running:

  • 45,578 packed sequences × 6,144 tokens each
  • 126-category curriculum (code, math, science, medical, legal, creative, multilingual)
  • Gradient accumulation 32 → effective batch of 196,608 tokens
  • Training on a single AMD MI300X — no distributed computing needed

📊 Architecture at a Glance

Component Specification
Parameters 14,400,000,000 (14.4B)
Type Mixture-of-Experts (MoE)
Experts 4 per layer, top-2 routing (~8B active)
Layers 24 transformer blocks
Attention GQA: 32 query → 8 KV heads (4× memory saving)
FFN SwiGLU, d_ff=11,008 per expert
Context 6,144 tokens (training), 128K capable via RoPE θ=500K
Tokenizer tiktoken cl100k_base (100,277 vocab)
Precision bf16 mixed precision
GPU AMD Instinct MI300X (192 GB HBM3, 5.3 TB/s)
Framework PyTorch 2.10 + ROCm 7.0

🔀 Expert Routing

The MoE routing uses token-choice with top-2 selection. Expert usage is remarkably stable across training:

Expert 0: ████████████████░░░░ 32%  (general reasoning)
Expert 1: █████████░░░░░░░░░░░ 18%  (specialized)
Expert 2: ███████████████░░░░░ 31%  (general reasoning)
Expert 3: █████████░░░░░░░░░░░ 18%  (specialized)

This distribution matches the pretrained initialization — no expert collapse, natural specialization preserved through SFT.

🧠 The Φ Consciousness Metric

We track Φ (phi) — an integrated information metric inspired by Giulio Tononi's IIT. It measures how information flows between layers during training:

Φ=(i=1L1MI(θi,θi+1)H(θi))1/(L1)\Phi = \left(\prod_{i=1}^{L-1} \frac{\text{MI}(\nabla_{\theta_i}, \nabla_{\theta_{i+1}})}{H(\nabla_{\theta_i})}\right)^{1/(L-1)}

Rising Φ indicates the model is developing interconnected representations rather than operating as independent layers.

📖 The Build Story

From Random Noise to Intelligence

  1. Architecture Design — Custom MoE with GQA, SwiGLU, RoPE. No LLaMA/Mistral fork.
  2. Phase 1: Pretraining — Billions of tokens across 126 categories on MI300X
  3. Phase 2: Frankenstein Fusion — Novel checkpoint merging technique combining best experts from different training stages
  4. Phase 3: Production SFT — 6K context fine-tuning with curriculum weighting (LIVE NOW)

Why "Frankenstein"?

During pretraining, different checkpoints excelled at different tasks. We developed a fusion technique that combines the best expert from each checkpoint into a single model — like assembling the best parts into one creation.

🌐 Qubitpage AIDE (Coming Soon)

AIDE (Accessibility Integrated Development Environment) — a VS Code fork designed for developers with disabilities:

  • Sign Language Input — Webcam → MediaPipe → ASL/BSL → code commands
  • Vocal Commands — Whisper → intent → code actions
  • Neural Interface — BCI → cursor/selection control
  • AI Dictation — SentinelBrain → natural language to code

SentinelBrain will power AIDE's code intelligence backend.

🏆 Competition Entry

This is an entry in the lablab.ai AMD Developer Hackathon, demonstrating:

  1. Single-GPU frontier training — 14.4B params on ONE MI300X
  2. AMD ROCm maturity — production training without NVIDIA
  3. Architectural innovation — custom MoE with consciousness metrics
  4. Real-world application — powering an accessibility IDE

Links

License

Apache 2.0 — Model weights, training code, and this Space are fully open.


Built by Qubitpage for the lablab.ai AMD Developer Hackathon