Noeum-1-Nano-Base
A 0.6B MoE foundation model trained entirely from scratch.
⚠ Base Model Notice
This is the raw pre-trained base model. It is designed for text completion and few-shot learning. It has not undergone Supervised Fine-Tuning (SFT) or Reinforcement Learning (RLHF).
- For Chat & Reasoning: Use the Instruct version (Coming Soon).
- For Fine-tuning: This serves as a highly efficient starting point.
Overview
Noeum-1-Nano-Base is a nano-scale Mixture-of-Experts (MoE) foundation model. Despite its compact size (0.6B total parameters, with only ~0.2B active during inference), it matches the capabilities of significantly larger dense models.
Built entirely from scratch by Noeum (an independent Austrian AI lab), this model validates a high-efficiency training hypothesis: using high-signal density data (18B tokens) to achieve competitive performance without brute-force scaling.
Key Features
- Architecture: Custom Sparse MoE with 8 routed experts and 1 shared expert.
- Efficiency: Trained on only 18B tokens (approx. 1/100th of standard Llama/Qwen training runs).
- Data Sources: A curated mix of arXiv (Math/CS), GitHub (Python), Wikipedia, and FineWeb-Edu.
Benchmarks
Despite the extreme disparity in training volume (18B vs 2T+ tokens), Noeum-1-Nano-Base establishes strong baselines on standard zero-shot and few-shot tasks.
| Task | Metric | Noeum-1-Nano-Base | Domain |
|---|---|---|---|
| SciQ | Accuracy | 77.5% | Scientific Knowledge |
| MRPC | F1 Score | 81.2% | Semantic Equivalence |
| BoolQ | Accuracy | 62.0% | Reading Comprehension |
| PIQA | Accuracy | 62.9% | Physical Commonsense |
| ARC-Easy | Accuracy | 47.1% | General Reasoning |
Quickstart
This model uses a custom architecture. You must set trust_remote_code=True to load it.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
# Path to your base noeum model
MODEL_PATH = "./base/Noeum-hf-base"
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
def main():
print(f"--- Evaluating Base Model Noeum on {DEVICE} ---")
# 1. Load Resources
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True)
if tokenizer.pad_token is None: tokenizer.pad_token = tokenizer.eos_token
model = AutoModelForCausalLM.from_pretrained(
MODEL_PATH,
trust_remote_code=True,
torch_dtype=torch.float16 if DEVICE == "cuda" else torch.float32
).to(DEVICE)
model.eval()
# Helper function for generation
def run_test(test_name, prompt, max_new=50, temp=0.7):
print(f"\n=== {test_name} ===")
print(f"Input Pattern:\n{prompt.strip()}")
print("-" * 20)
inputs = tokenizer(prompt, return_tensors="pt").to(DEVICE)
with torch.no_grad():
output_ids = model.generate(
**inputs,
max_new_tokens=max_new,
do_sample=True,
temperature=temp,
top_p=0.9,
use_cache=False, # Essential for your MoE architecture compatibility
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id
)
# Decode only the NEW tokens to see exactly what the model added
new_tokens = output_ids[0][inputs.input_ids.shape[1]:]
output_text = tokenizer.decode(new_tokens, skip_special_tokens=True)
print(f"Model Completion:\n{output_text}")
print("=" * 30)
# ==============================================================================
# TEST 1: Few-Shot Knowledge
# Base models need examples to know they should answer, not ask more questions.
# ==============================================================================
few_shot_prompt = """
Q: What is the capital of Germany?
A: Berlin
Q: What is the capital of Spain?
A: Madrid
Q: What is the capital of France?
A:"""
run_test("Test 1: Few-Shot Knowledge", few_shot_prompt, max_new=10, temp=0.1)
# ==============================================================================
# TEST 2: Story Continuation
# Tests the model's ability to maintain narrative flow and grammar.
# ==============================================================================
story_prompt = "The spaceship landed silently on the unknown planet. The captain opened the hatch and saw"
run_test("Test 2: Creative Writing", story_prompt, max_new=60, temp=0.8)
# ==============================================================================
# TEST 3: Logic/Code Pattern
# Base models are often good at completing structured patterns or code.
# ==============================================================================
code_prompt = """
def add(a, b):
return a + b
def multiply(a, b):"""
run_test("Test 3: Code/Pattern Completion", code_prompt, max_new=30, temp=0.2)
if __name__ == "__main__":
main()
Architecture Details
| Component | Specification |
|---|---|
| Type | Mixture-of-Experts (MoE) |
| Total Params | 0.6B |
| Active Params | ~0.2B |
| Experts | 8 Routed, 1 Shared (Top-2 Router) |
| Layers | 24 |
| Attention | 12 Heads (GQA), 768 Hidden Dim |
| Context Window | 2048 Tokens |
About Noeum
Noeum is an independent AI research lab based in Austria. We execute the full AI pipeline—from architecture design and pre-training to alignment—entirely in-house.
Our Philosophy: Iterate fast at nano-scale; scale only what works. Noeum-1-Nano serves as a proof-of-concept for our "High-Signal" training stack, demonstrating that architectural intelligence can rival brute-force compute.
🌐 Website: noeum.ai 📧 Contact: contact@noeum.ai
- Downloads last month
- 19