--- language: - en license: apache-2.0 library_name: transformers tags: - boreal - deltanet - hybrid - linear-attention - swiglu - rmsnorm - rope - gqa - pretraining - tst - crucible - ddm - submodular - data-curation - sovereign-ai - canadian-ai - community - canada pipeline_tag: text-generation base_model: GestaltLabs/BOREAL-2B --- ![BOREAL](https://huggingface.co/GestaltLabs/BOREAL-2B/resolve/main/Boreal.png) # BOREAL-2B — Canadian Sovereign AI **B**alanced **O**rthogonal **R**ecurrent **E**xpert **A**ttention **L**ayers Built in Toronto. Apache 2.0. A 2-billion-parameter dense hybrid language model pretrained from scratch on 500B–2T tokens. BOREAL-2B is the first model in the BOREAL family intended for actual downstream use — the one you download, fine-tune, quantize, and build on. It carries forward the Gated DeltaNet architecture validated by BOREAL-250M and scales it to a size where benchmarks become meaningful. Targets competitive performance against Qwen3-1.7B and SmolLM2-1.7B while offering native 64K context — 4x what pure Transformers at this scale can practically support. ## Architecture | Component | Detail | |-----------|--------| | **Type** | Dense hybrid — Gated DeltaNet + GQA | | **Parameters** | ~2B | | **Hidden size** | 2,048 | | **Layers** | 32 (24 DeltaNet + 8 full attention) | | **Ratio** | 3:1 linear-to-full attention | | **Full attention** | GQA: 16 query heads, 4 KV heads, head_dim=256 | | **DeltaNet** | Gated linear attention: 8 QK heads, 16 V heads, head_dim=128 | | **Conv kernel** | 4 | | **FFN** | SwiGLU, intermediate=6,144 | | **Norm** | RMSNorm, eps=1e-6 | | **Position** | RoPE, theta=10M, partial_rotary_factor=0.25 | | **Output gate** | Swish-gated | | **Vocab** | 151,936 (Qwen3 tokenizer) | | **Context** | 65,536 tokens native, extensible to 256K | | **MTP** | 1 multi-token prediction head | ## Training | Parameter | Value | |-----------|-------| | **Data tokens** | 500B–2T | | **Corpus** | FineWeb-Edu + StarCoder2 + OpenWebMath + curated multilingual | | **Method** | Token Superposition Training (Nous Research) | | **TST config** | s=4, r=0.5 | | **Optimizer** | AdamW (β₁=0.9, β₂=0.95) | | **Peak LR** | 3e-4 | | **Schedule** | Cosine decay to 10% peak | | **Weight decay** | 0.1 | | **Batch size** | ~4M tokens/step | | **Precision** | BF16 weights, FP32 DeltaNet states | | **Location** | Toronto, Ontario, Canada | ### Data Pipeline Built on **Crucible** — RUPS skyline weighting + EESD submodular diversity selection with formal (1 - 1/e) guarantees — and the **DDM analyzer** that models reasoning as Ornstein-Uhlenbeck evidence accumulation. The same pipeline that produced Harmonic-9B and Ornstein-27B, where 799 DDM-curated rows outperformed datasets 20–100x larger. ### Training Phases ``` Phase 1 (TST): First 50% of tokens in superposition mode Phase 2 (Recovery): Remaining 50% as standard autoregressive NTP Phase 3 (Extension): Mid-training at 64K context, YaRN scaling Phase 4 (Anneal): Crucible-selected high-quality data, DDM loss weights ``` ## Expected Performance | Benchmark | Target | Comparison | |-----------|--------|-------------| | HellaSwag | 55–62% | Qwen3-1.7B: ~58% | | ARC-Easy | 65–72% | Qwen3-1.7B: ~68% | | PIQA | 72–78% | Qwen3-1.7B: ~75% | | WinoGrande | 58–64% | Qwen3-1.7B: ~60% | | MMLU (5-shot) | 28–35% | Qwen3-1.7B: ~32% | BOREAL-2B targets parity with Qwen3-1.7B while supporting 4x the native context length and using roughly half the inference memory at long context. ## The BOREAL Family Every model trained in Canada. Every weight learned from random init. | Model | Params | Type | Context | Status | |-------|--------|------|---------|--------| | **[BOREAL-250M](https://huggingface.co/GestaltLabs/BOREAL-250M)** | 250M | Dense | 32K | Seeking compute | | **BOREAL-2B** | 2B | Dense | 64K | Seeking compute | | **[BOREAL-10B-MoE](https://huggingface.co/GestaltLabs/BOREAL-10B-MoE)** | ~10B / ~2B active | DeltaNet + MoE | 256K | Cluster required | ## License Apache 2.0. Built for Canadian researchers, startups, and institutions. No strings. No API keys. No foreign dependency. ## Author Built by [DJLougen](https://huggingface.co/DJLougen) / [GestaltLabs](https://huggingface.co/GestaltLabs). University of Toronto. Toronto, Canada. [☕ Support sovereign AI on Ko-fi](https://ko-fi.com/djlougen)