Noeum Logo

Noeum-1-Nano-Base

A 0.6B MoE foundation model trained entirely from scratch.

WebsiteBenchmarksQuickstartAbout Noeum


⚠ Base Model Notice

This is the raw pre-trained base model. It is designed for text completion and few-shot learning. It has not undergone Supervised Fine-Tuning (SFT) or Reinforcement Learning (RLHF).

  • For Chat & Reasoning: Use the Instruct version (Coming Soon).
  • For Fine-tuning: This serves as a highly efficient starting point.

Overview

Noeum-1-Nano-Base is a nano-scale Mixture-of-Experts (MoE) foundation model. Despite its compact size (0.6B total parameters, with only ~0.2B active during inference), it matches the capabilities of significantly larger dense models.

Built entirely from scratch by Noeum (an independent Austrian AI lab), this model validates a high-efficiency training hypothesis: using high-signal density data (18B tokens) to achieve competitive performance without brute-force scaling.

Key Features

  • Architecture: Custom Sparse MoE with 8 routed experts and 1 shared expert.
  • Efficiency: Trained on only 18B tokens (approx. 1/100th of standard Llama/Qwen training runs).
  • Data Sources: A curated mix of arXiv (Math/CS), GitHub (Python), Wikipedia, and FineWeb-Edu.

Benchmarks

Despite the extreme disparity in training volume (18B vs 2T+ tokens), Noeum-1-Nano-Base establishes strong baselines on standard zero-shot and few-shot tasks.

Task Metric Noeum-1-Nano-Base Domain
SciQ Accuracy 77.5% Scientific Knowledge
MRPC F1 Score 81.2% Semantic Equivalence
BoolQ Accuracy 62.0% Reading Comprehension
PIQA Accuracy 62.9% Physical Commonsense
ARC-Easy Accuracy 47.1% General Reasoning

Quickstart

This model uses a custom architecture. You must set trust_remote_code=True to load it.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# Path to your base noeum model
MODEL_PATH = "./base/Noeum-hf-base"
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"


def main():
    print(f"--- Evaluating Base Model Noeum on {DEVICE} ---")

    # 1. Load Resources
    tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True)
    if tokenizer.pad_token is None: tokenizer.pad_token = tokenizer.eos_token

    model = AutoModelForCausalLM.from_pretrained(
        MODEL_PATH,
        trust_remote_code=True,
        torch_dtype=torch.float16 if DEVICE == "cuda" else torch.float32
    ).to(DEVICE)
    model.eval()

    # Helper function for generation
    def run_test(test_name, prompt, max_new=50, temp=0.7):
        print(f"\n=== {test_name} ===")
        print(f"Input Pattern:\n{prompt.strip()}")
        print("-" * 20)

        inputs = tokenizer(prompt, return_tensors="pt").to(DEVICE)

        with torch.no_grad():
            output_ids = model.generate(
                **inputs,
                max_new_tokens=max_new,
                do_sample=True,
                temperature=temp,
                top_p=0.9,
                use_cache=False,  # Essential for your MoE architecture compatibility
                pad_token_id=tokenizer.pad_token_id,
                eos_token_id=tokenizer.eos_token_id
            )

        # Decode only the NEW tokens to see exactly what the model added
        new_tokens = output_ids[0][inputs.input_ids.shape[1]:]
        output_text = tokenizer.decode(new_tokens, skip_special_tokens=True)
        print(f"Model Completion:\n{output_text}")
        print("=" * 30)

    # ==============================================================================
    # TEST 1: Few-Shot Knowledge
    # Base models need examples to know they should answer, not ask more questions.
    # ==============================================================================
    few_shot_prompt = """
Q: What is the capital of Germany?
A: Berlin
Q: What is the capital of Spain?
A: Madrid
Q: What is the capital of France?
A:"""
    run_test("Test 1: Few-Shot Knowledge", few_shot_prompt, max_new=10, temp=0.1)

    # ==============================================================================
    # TEST 2: Story Continuation
    # Tests the model's ability to maintain narrative flow and grammar.
    # ==============================================================================
    story_prompt = "The spaceship landed silently on the unknown planet. The captain opened the hatch and saw"
    run_test("Test 2: Creative Writing", story_prompt, max_new=60, temp=0.8)

    # ==============================================================================
    # TEST 3: Logic/Code Pattern
    # Base models are often good at completing structured patterns or code.
    # ==============================================================================
    code_prompt = """
def add(a, b):
    return a + b

def multiply(a, b):"""
    run_test("Test 3: Code/Pattern Completion", code_prompt, max_new=30, temp=0.2)


if __name__ == "__main__":
    main()

Architecture Details

Component Specification
Type Mixture-of-Experts (MoE)
Total Params 0.6B
Active Params ~0.2B
Experts 8 Routed, 1 Shared (Top-2 Router)
Layers 24
Attention 12 Heads (GQA), 768 Hidden Dim
Context Window 2048 Tokens

About Noeum

Noeum

Noeum is an independent AI research lab based in Austria. We execute the full AI pipeline—from architecture design and pre-training to alignment—entirely in-house.

Our Philosophy: Iterate fast at nano-scale; scale only what works. Noeum-1-Nano serves as a proof-of-concept for our "High-Signal" training stack, demonstrating that architectural intelligence can rival brute-force compute.


🌐 Website: noeum.ai 📧 Contact: contact@noeum.ai

Downloads last month
19
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train noeum/noeum-1-nano-base