--- title: "SentinelBrain-14B MoE โ€” Live Training Dashboard" emoji: ๐Ÿง  colorFrom: purple colorTo: blue sdk: gradio sdk_version: "5.0.0" app_file: app.py pinned: true license: apache-2.0 short_description: "14.4B MoE from scratch โ€” training live on AMD MI300X" tags: - sentinelbrain - mixture-of-experts - amd - mi300x - rocm - training-dashboard - from-scratch - moe - phi-metric - consciousness - accessibility - aide --- # ๐Ÿง  SentinelBrain-14B MoE โ€” Live Training Dashboard > **14.4 billion parameters ยท 4 experts ร— top-2 routing ยท Trained entirely from scratch ยท Live on AMD Instinct MI300X** This Space is a real-time window into an actively training large language model. No inference runs here โ€” the 14.4B parameter MoE model is training on an AMD Instinct MI300X (192 GB HBM3) and this dashboard displays live metrics. --- ## ๐Ÿ”ฅ What's Happening Now **Phase 3 Production SFT** is running: - 45,578 packed sequences ร— 6,144 tokens each - 126-category curriculum (code, math, science, medical, legal, creative, multilingual) - Gradient accumulation 32 โ†’ effective batch of **196,608 tokens** - Training on a single AMD MI300X โ€” no distributed computing needed ## ๐Ÿ“Š Architecture at a Glance | Component | Specification | |-----------|--------------| | **Parameters** | 14,400,000,000 (14.4B) | | **Type** | Mixture-of-Experts (MoE) | | **Experts** | 4 per layer, top-2 routing (~8B active) | | **Layers** | 24 transformer blocks | | **Attention** | GQA: 32 query โ†’ 8 KV heads (4ร— memory saving) | | **FFN** | SwiGLU, d_ff=11,008 per expert | | **Context** | 6,144 tokens (training), 128K capable via RoPE ฮธ=500K | | **Tokenizer** | tiktoken cl100k_base (100,277 vocab) | | **Precision** | bf16 mixed precision | | **GPU** | AMD Instinct MI300X (192 GB HBM3, 5.3 TB/s) | | **Framework** | PyTorch 2.10 + ROCm 7.0 | ## ๐Ÿ”€ Expert Routing The MoE routing uses **token-choice** with top-2 selection. Expert usage is remarkably stable across training: ``` Expert 0: โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘ 32% (general reasoning) Expert 1: โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘ 18% (specialized) Expert 2: โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘ 31% (general reasoning) Expert 3: โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘ 18% (specialized) ``` This distribution matches the pretrained initialization โ€” no expert collapse, natural specialization preserved through SFT. ## ๐Ÿง  The ฮฆ Consciousness Metric We track **ฮฆ (phi)** โ€” an integrated information metric inspired by Giulio Tononi's IIT. It measures how information flows between layers during training: $$\Phi = \left(\prod_{i=1}^{L-1} \frac{\text{MI}(\nabla_{\theta_i}, \nabla_{\theta_{i+1}})}{H(\nabla_{\theta_i})}\right)^{1/(L-1)}$$ Rising ฮฆ indicates the model is developing interconnected representations rather than operating as independent layers. ## ๐Ÿ“– The Build Story ### From Random Noise to Intelligence 1. **Architecture Design** โ€” Custom MoE with GQA, SwiGLU, RoPE. No LLaMA/Mistral fork. 2. **Phase 1: Pretraining** โ€” Billions of tokens across 126 categories on MI300X 3. **Phase 2: Frankenstein Fusion** โ€” Novel checkpoint merging technique combining best experts from different training stages 4. **Phase 3: Production SFT** โ€” 6K context fine-tuning with curriculum weighting (**LIVE NOW**) ### Why "Frankenstein"? During pretraining, different checkpoints excelled at different tasks. We developed a fusion technique that combines the best expert from each checkpoint into a single model โ€” like assembling the best parts into one creation. ## ๐ŸŒ Qubitpage AIDE (Coming Soon) **AIDE** (Accessibility Integrated Development Environment) โ€” a VS Code fork designed for developers with disabilities: - **Sign Language Input** โ€” Webcam โ†’ MediaPipe โ†’ ASL/BSL โ†’ code commands - **Vocal Commands** โ€” Whisper โ†’ intent โ†’ code actions - **Neural Interface** โ€” BCI โ†’ cursor/selection control - **AI Dictation** โ€” SentinelBrain โ†’ natural language to code SentinelBrain will power AIDE's code intelligence backend. ## ๐Ÿ† Competition Entry This is an entry in the **lablab.ai AMD Developer Hackathon**, demonstrating: 1. **Single-GPU frontier training** โ€” 14.4B params on ONE MI300X 2. **AMD ROCm maturity** โ€” production training without NVIDIA 3. **Architectural innovation** โ€” custom MoE with consciousness metrics 4. **Real-world application** โ€” powering an accessibility IDE ## Links - ๐Ÿ“„ [Whitepaper](https://sentinel.qubitpage.com/whitepaper) - ๐Ÿ“Š [Full Dashboard](https://sentinel.qubitpage.com) - ๐Ÿค— [Model Weights](https://huggingface.co/lablab-ai-amd-developer-hackathon/SentinelBrain-14B-MoE-v0.1) - ๐ŸŒ [Qubitpage](https://github.com/qubitpage) ## License Apache 2.0 โ€” Model weights, training code, and this Space are fully open. --- *Built by Qubitpage for the lablab.ai AMD Developer Hackathon*