inv0krr
/

leworld-memory-architecture

Model card Files Files and versions

xet

Community

inv0krr commited on 14 days ago

Commit

2f44c12

verified ·

1 Parent(s): 9585a45

Add README

Browse files

Files changed (1) hide show

README.md +92 -0

README.md ADDED Viewed

	@@ -0,0 +1,92 @@

+# LeWorld Memory Architecture 🧠⚡
+A CPU-inspired hierarchical neural architecture where **3 Small LeWorld Models (SLMs)** compete to find the most useful memory for **1 Big LeWorld Model (BLM)** to predict the next world state.
+## Architecture
+| Component | Parameters | Role |
+|-----------|-----------|------|
+| **Artificial Memory** | 21K | Bit-level storage (64K words × 32 bits) + learned bit encoder/decoder |
+| **SLM-0** | 745K | State → memory address range |
+| **SLM-1** | 745K | State → memory address range |
+| **SLM-2** | 745K | State → memory address range |
+| **BLM** | 11.2M | SLM selector `[1,0,1]` + next-state predictor + info requester |
+| **Total** | **13.5M** | |
+## Key Ideas
+1. **CPU-Style Memory**: Actual bit-level storage (64K × 32-bit words), accessed by address ranges — just like RAM
+2. **Product-Key Addressing**: SLMs output addresses by predicting high byte (256 choices) + low byte (256 choices) = 65K addresses with only 512 logits
+3. **Binary SLM Routing**: BLM selects which SLMs to trust via Straight-Through Sigmoid → hard `[1,0,1]` in forward, differentiable in backward
+4. **Active Information Request**: BLM generates "what do I need next?" queries that modulate SLM memory search at the next timestep
+5. **3-Phase Training**: Pre-train → Joint end-to-end → Info-request refinement with paired-branch reward
+## Data Flow
+```
+                    ┌─────────────────────────────┐
+                    │    ARTIFICIAL MEMORY         │
+                    │  [0][1][0][1]...[1][0][1][0] │
+                    │   64K words × 32 bits each   │
+                    └──────────┬──────────────────-─┘
+                               │ READ(addr_range)
+           ┌───────────────────┼───────────────────┐
+    ┌──────▼──────┐   ┌────────▼───────┐   ┌──────▼──────────┐
+    │   SLM-0     │   │    SLM-1       │   │     SLM-2       │
+    │  (745K)     │   │   (745K)       │   │    (745K)       │
+    │ past_state  │   │ past_state     │   │ past_state      │
+    │ curr_state  │   │ curr_state     │   │ curr_state      │
+    │ character.  │   │ character.     │   │ character.      │
+    │  → addr     │   │  → addr        │   │  → addr         │
+    └──────┬──────┘   └────────┬───────┘   └────────┬────────┘
+           │                   │                     │
+           └──────────►  BLM (11.2M)  ◄──────────────┘
+                    mask = [1, 0, 1]
+                    → next_state prediction
+                    → "what info do I need next?"
+```
+## Files
+| File | Description |
+|------|-------------|
+| `leworld_architecture.py` | All model definitions: Memory, SLM, BLM, full system (~990 lines) |
+| `leworld_training.py` | 3-phase training pipeline, data generation, evaluation (~820 lines) |
+| `PLAN.md` | Complete design document with literature references |
+## Quick Start
+```python
+from leworld_architecture import LeWorldSystem, MemoryConfig, SLMConfig, BLMConfig
+from leworld_training import run_training, TrainingConfig
+# Build system
+system = LeWorldSystem(MemoryConfig(), SLMConfig(), BLMConfig())
+# Train (3 phases: pre-train → joint → refine)
+metrics = run_training(system, TrainingConfig())
+```
+## Literature Foundation
+| Paper | What we borrowed |
+|-------|-----------------|
+| [Gumbel-Softmax](https://arxiv.org/abs/1611.01144) | Straight-Through sigmoid for binary routing |
+| [Switch Transformers](https://arxiv.org/abs/2101.03961) | Gate-value scaling, load balance loss |
+| [Product Key Memory](https://arxiv.org/abs/1907.05242) | Address decomposition into sub-keys |
+| [LM2](https://arxiv.org/abs/2502.06049) | LSTM-style memory gates |
+| [NAMM](https://arxiv.org/abs/2410.13166) | Binary memory eviction |
+| [ProactAgent](https://arxiv.org/abs/2604.20572) | Paired-branch reward for retrieval decisions |
+| [Mamba](https://arxiv.org/abs/2312.00752) | Explicit state maintenance |
+## Verified Results (demo run)
+```
+Phase 1: SLM loss 12.87 → 7.13, BLM loss 0.39 → 0.33
+Phase 2: Routing becomes diverse — SLM usage: [0.72, 0.79, 0.67]
+Phase 3: Info-request improves predictions by 19.5 loss units vs baseline
+Final: MSE=0.36, Routing entropy=0.70
+Per-step MSE: [0.64, 0.44, 0.31, 0.23, 0.19]  ← improves over time
+Routing patterns: [1,0,1] → [0,1,1] → [1,1,1] → [1,1,0] → [0,1,0]
+```