Spaces:

lablab-ai-amd-developer-hackathon
/

sentinel-prime-frankenstein-edition

Sleeping

App Files Files Community

sentinel-prime-frankenstein-edition / README.md

qubitpage

v2.0: Competition-grade dashboard with expert routing, AIDE preview, full history

c9b45ec 23 days ago

preview code

raw

history blame contribute delete

5.14 kB

A newer version of the Gradio SDK is available: 6.15.0

Upgrade

metadata

title: SentinelBrain-14B MoE — Live Training Dashboard
emoji: 🧠
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: 5.0.0
app_file: app.py
pinned: true
license: apache-2.0
short_description: 14.4B MoE from scratch — training live on AMD MI300X
tags:
  - sentinelbrain
  - mixture-of-experts
  - amd
  - mi300x
  - rocm
  - training-dashboard
  - from-scratch
  - moe
  - phi-metric
  - consciousness
  - accessibility
  - aide

🧠 SentinelBrain-14B MoE — Live Training Dashboard

14.4 billion parameters · 4 experts × top-2 routing · Trained entirely from scratch · Live on AMD Instinct MI300X

This Space is a real-time window into an actively training large language model. No inference runs here — the 14.4B parameter MoE model is training on an AMD Instinct MI300X (192 GB HBM3) and this dashboard displays live metrics.

🔥 What's Happening Now

Phase 3 Production SFT is running:

45,578 packed sequences × 6,144 tokens each
126-category curriculum (code, math, science, medical, legal, creative, multilingual)
Gradient accumulation 32 → effective batch of 196,608 tokens
Training on a single AMD MI300X — no distributed computing needed

📊 Architecture at a Glance

Component	Specification
Parameters	14,400,000,000 (14.4B)
Type	Mixture-of-Experts (MoE)
Experts	4 per layer, top-2 routing (~8B active)
Layers	24 transformer blocks
Attention	GQA: 32 query → 8 KV heads (4× memory saving)
FFN	SwiGLU, d_ff=11,008 per expert
Context	6,144 tokens (training), 128K capable via RoPE θ=500K
Tokenizer	tiktoken cl100k_base (100,277 vocab)
Precision	bf16 mixed precision
GPU	AMD Instinct MI300X (192 GB HBM3, 5.3 TB/s)
Framework	PyTorch 2.10 + ROCm 7.0

🔀 Expert Routing

The MoE routing uses token-choice with top-2 selection. Expert usage is remarkably stable across training:

Expert 0: ████████████████░░░░ 32%  (general reasoning)
Expert 1: █████████░░░░░░░░░░░ 18%  (specialized)
Expert 2: ███████████████░░░░░ 31%  (general reasoning)
Expert 3: █████████░░░░░░░░░░░ 18%  (specialized)

This distribution matches the pretrained initialization — no expert collapse, natural specialization preserved through SFT.

🧠 The Φ Consciousness Metric

We track Φ (phi) — an integrated information metric inspired by Giulio Tononi's IIT. It measures how information flows between layers during training:

$\Phi = \left(\prod_{i=1}^{L-1} \frac{\text{MI}(\nabla_{\theta_i}, \nabla_{\theta_{i+1}})}{H(\nabla_{\theta_i})}\right)^{1/(L-1)}$

Rising Φ indicates the model is developing interconnected representations rather than operating as independent layers.

📖 The Build Story

From Random Noise to Intelligence

Architecture Design — Custom MoE with GQA, SwiGLU, RoPE. No LLaMA/Mistral fork.
Phase 1: Pretraining — Billions of tokens across 126 categories on MI300X
Phase 2: Frankenstein Fusion — Novel checkpoint merging technique combining best experts from different training stages
Phase 3: Production SFT — 6K context fine-tuning with curriculum weighting (LIVE NOW)

Why "Frankenstein"?

During pretraining, different checkpoints excelled at different tasks. We developed a fusion technique that combines the best expert from each checkpoint into a single model — like assembling the best parts into one creation.

🌐 Qubitpage AIDE (Coming Soon)

AIDE (Accessibility Integrated Development Environment) — a VS Code fork designed for developers with disabilities:

Sign Language Input — Webcam → MediaPipe → ASL/BSL → code commands
Vocal Commands — Whisper → intent → code actions
Neural Interface — BCI → cursor/selection control
AI Dictation — SentinelBrain → natural language to code

SentinelBrain will power AIDE's code intelligence backend.

🏆 Competition Entry

This is an entry in the lablab.ai AMD Developer Hackathon, demonstrating:

Single-GPU frontier training — 14.4B params on ONE MI300X
AMD ROCm maturity — production training without NVIDIA
Architectural innovation — custom MoE with consciousness metrics
Real-world application — powering an accessibility IDE

License

Apache 2.0 — Model weights, training code, and this Space are fully open.

Built by Qubitpage for the lablab.ai AMD Developer Hackathon