ANDREA-12M

Autonomous Neural Data Recipe for Education and Agency

A 12.8M parameter language model grown on a single RTX 4090 using a bandit-controlled curriculum. Part of the permacomputer project — open source, open data, open weights.

Model Details

Property	Value
Parameters	12.8M
Architecture	Transformer decoder, 384d/12h/6L
Embedding dim	384
Heads	12
Layers	6
Context	1024 tokens
Tokenizer	Harris morpheme (2048 segments, 2305 vocab)
Training steps	43,587
Final SMMA loss	2.0
Best single-step loss	0.21
Training time	~72 hours
Hardware	Single NVIDIA RTX 4090 (24GB VRAM, 1.4GB used)
CUDA engine	microgpt_cuda.cu (custom, FP32)
Born	2026-03-21 12:53 UTC / 08:53 EST
License	AGPL-3.0

Files

File	Step	Description
`ANDREA-12M.bin`	43,587	Final checkpoint (SMMA 2.0)
`ANDREA-12M-best.bin`	42,300	Best checkpoint (lowest loss during training)
`harris_segments.json`	—	Harris tokenizer segments (required for inference and fine-tuning)

Checkpoint format

Binary, little-endian: [int32 step][int32 n_params][n_params × float32 weights][n_params × float32 m][n_params × float32 v]

Weights: model parameters (12.8M floats, ~49MB)
m: Adam first moment (same size)
v: Adam second moment (same size)
Total: ~147MB per checkpoint

Use either checkpoint to resume fine-tuning (weights + optimizer state preserved) or extract weights only for inference (first n_params floats after the 8-byte header).

Training Data

Trained on a curated mix of open conversational and educational data:

NousResearch/Hermes-3-Dataset (general, creative, roleplay) — 590K conversations
Dictionary — 88K word definitions distilled from Hermes 3 8B
Gutenberg — public domain literature (Project Gutenberg)
Additional: chat, smoltalk, oasst, dolly, IRC, repo-docs

Data mix controlled by a UCB1 multi-armed bandit with dice-based phase control. The bandit dynamically adjusts source weights during training based on per-source loss trajectories. Full curriculum specification in the white paper.

Training Recipe

Harris morpheme tokenizer (2048 segments)
Cosine LR schedule with warm restart at step 25K (0.0004 peak)
Phase-based bandit: 2 focus arms, 1d3 dice, source floors
Checkpoints every 100 steps, SIGTERM-safe
Per-source reward attribution, epoch penalty, coverage tracking

Capabilities

ANDREA-12M learns patterns, not facts. At 12.8M parameters it produces:

Correct Q&A turn structure (> question / < answer)
Definition-style responses
Multi-sentence outputs with plausible grammar
Instruction-following scaffolding ("explain", "define", "describe")

It does NOT produce factually accurate content — it's a pattern machine. Factual accuracy requires scaling to ANDREA-120M (planned).

Usage

# Inference via microgpt
from microgpt import load_model, generate_fast

model = load_model('ANDREA-12M.json')
results = generate_fast(model['state_dict'], model['uchars'], model['bos'],
                        384, 12, 6, 1024, prefix='> what is an apple? / <')
print(results[0][0])

White Paper

ANDREA-12M-WHITEPAPER.pdf — full technical paper covering architecture, bandit curriculum, data sources, training recipe, and results.

Source: whitepaper/ANDREA/WHITEPAPER.rst in the uncloseai-cli repository.

Citation

ANDREA: Autonomous Neural Data Recipe for Education and Agency
TimeHexOn, foxhop, russell@unturf
March 2026, permacomputer.com

License

AGPL-3.0. Code outlasts authors. Infrastructure outlasts builders.

● ○

Downloads last month: -; Downloads are not tracked for this model. How to track