LeWorld Memory Architecture 🧠⚡

A CPU-inspired hierarchical neural architecture where 3 Small LeWorld Models (SLMs) compete to find the most useful memory for 1 Big LeWorld Model (BLM) to predict the next world state.

Architecture

Component	Parameters	Role
Artificial Memory	21K	Bit-level storage (64K words × 32 bits) + learned bit encoder/decoder
SLM-0	745K	State → memory address range
SLM-1	745K	State → memory address range
SLM-2	745K	State → memory address range
BLM	11.2M	SLM selector `[1,0,1]` + next-state predictor + info requester
Total	13.5M

Key Ideas

CPU-Style Memory: Actual bit-level storage (64K × 32-bit words), accessed by address ranges — just like RAM
Product-Key Addressing: SLMs output addresses by predicting high byte (256 choices) + low byte (256 choices) = 65K addresses with only 512 logits
Binary SLM Routing: BLM selects which SLMs to trust via Straight-Through Sigmoid → hard [1,0,1] in forward, differentiable in backward
Active Information Request: BLM generates "what do I need next?" queries that modulate SLM memory search at the next timestep
3-Phase Training: Pre-train → Joint end-to-end → Info-request refinement with paired-branch reward

Data Flow

                    ┌─────────────────────────────┐
                    │    ARTIFICIAL MEMORY         │
                    │  [0][1][0][1]...[1][0][1][0] │
                    │   64K words × 32 bits each   │
                    └──────────┬──────────────────-─┘
                               │ READ(addr_range)
           ┌───────────────────┼───────────────────┐
    ┌──────▼──────┐   ┌────────▼───────┐   ┌──────▼──────────┐
    │   SLM-0     │   │    SLM-1       │   │     SLM-2       │
    │  (745K)     │   │   (745K)       │   │    (745K)       │
    │ past_state  │   │ past_state     │   │ past_state      │
    │ curr_state  │   │ curr_state     │   │ curr_state      │
    │ character.  │   │ character.     │   │ character.      │
    │  → addr     │   │  → addr        │   │  → addr         │
    └──────┬──────┘   └────────┬───────┘   └────────┬────────┘
           │                   │                     │
           └──────────►  BLM (11.2M)  ◄──────────────┘
                    mask = [1, 0, 1]
                    → next_state prediction
                    → "what info do I need next?"

Files

File	Description
`leworld_architecture.py`	All model definitions: Memory, SLM, BLM, full system (~990 lines)
`leworld_training.py`	3-phase training pipeline, data generation, evaluation (~820 lines)
`PLAN.md`	Complete design document with literature references

Quick Start

from leworld_architecture import LeWorldSystem, MemoryConfig, SLMConfig, BLMConfig
from leworld_training import run_training, TrainingConfig

# Build system
system = LeWorldSystem(MemoryConfig(), SLMConfig(), BLMConfig())

# Train (3 phases: pre-train → joint → refine)
metrics = run_training(system, TrainingConfig())

Literature Foundation

Paper	What we borrowed
Gumbel-Softmax	Straight-Through sigmoid for binary routing
Switch Transformers	Gate-value scaling, load balance loss
Product Key Memory	Address decomposition into sub-keys
LM2	LSTM-style memory gates
NAMM	Binary memory eviction
ProactAgent	Paired-branch reward for retrieval decisions
Mamba	Explicit state maintenance

Verified Results (demo run)

Phase 1: SLM loss 12.87 → 7.13, BLM loss 0.39 → 0.33
Phase 2: Routing becomes diverse — SLM usage: [0.72, 0.79, 0.67]
Phase 3: Info-request improves predictions by 19.5 loss units vs baseline

Final: MSE=0.36, Routing entropy=0.70
Per-step MSE: [0.64, 0.44, 0.31, 0.23, 0.19]  ← improves over time
Routing patterns: [1,0,1] → [0,1,1] → [1,1,1] → [1,1,0] → [0,1,0]