You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Sentinel Prime Nano โ€” Sparse MoE Language Model

Sentinel Prime is a from-scratch sparse Mixture of Experts (MoE) transformer built by QubitPage Research.

Architecture

Parameter Value
Total Parameters 322,435,584
Active Parameters ~161,217,792 per token
Hidden Dimension 768
Layers 12
Attention Heads 12 (Q) / 4 (KV)
FFN Dimension 2048
Experts 4 total, top-2 active
Vocab Size 100,277 (tiktoken cl100k_base)
Max Sequence Length 1024
Position Encoding RoPE (theta=500000.0)
Normalization RMSNorm
FFN Type SwiGLU
Attention Grouped Query Attention (GQA)

Key Features

  • Sparse MoE: Only 2/4 experts active per token
  • GQA: Memory-efficient grouped query attention
  • SwiGLU: LLaMA/Mistral-style feed-forward
  • RoPE: Rotary position embeddings for length generalization
  • From Scratch: No pretrained weights, trained from random initialization

Training

  • Data: FineWeb-Edu (educational web text)
  • Tokens Seen: 698,368
  • Best Validation Loss: 10.1536
  • Hardware: NVIDIA RTX 3060 12GB
  • Framework: PyTorch 2.11.0+cu126

Usage

from transformers import AutoModelForCausalLM, AutoConfig
# Register custom model
from hf_model import SentinelBrainConfig, SentinelBrainForCausalLM
from hf_tokenizer import SentinelBrainTokenizer

model = SentinelBrainForCausalLM.from_pretrained("qubitpage/sentinel-prime-nano")
tokenizer = SentinelBrainTokenizer()

input_ids = tokenizer("The meaning of life is", return_tensors="pt")["input_ids"]
output = model.generate(input_ids, max_new_tokens=50)
print(tokenizer.decode(output[0]))

License

Apache 2.0

Downloads last month
286
Safetensors
Model size
0.3B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support