Add README
Browse files
README.md
ADDED
|
@@ -0,0 +1,92 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# LeWorld Memory Architecture π§ β‘
|
| 2 |
+
|
| 3 |
+
A CPU-inspired hierarchical neural architecture where **3 Small LeWorld Models (SLMs)** compete to find the most useful memory for **1 Big LeWorld Model (BLM)** to predict the next world state.
|
| 4 |
+
|
| 5 |
+
## Architecture
|
| 6 |
+
|
| 7 |
+
| Component | Parameters | Role |
|
| 8 |
+
|-----------|-----------|------|
|
| 9 |
+
| **Artificial Memory** | 21K | Bit-level storage (64K words Γ 32 bits) + learned bit encoder/decoder |
|
| 10 |
+
| **SLM-0** | 745K | State β memory address range |
|
| 11 |
+
| **SLM-1** | 745K | State β memory address range |
|
| 12 |
+
| **SLM-2** | 745K | State β memory address range |
|
| 13 |
+
| **BLM** | 11.2M | SLM selector `[1,0,1]` + next-state predictor + info requester |
|
| 14 |
+
| **Total** | **13.5M** | |
|
| 15 |
+
|
| 16 |
+
## Key Ideas
|
| 17 |
+
|
| 18 |
+
1. **CPU-Style Memory**: Actual bit-level storage (64K Γ 32-bit words), accessed by address ranges β just like RAM
|
| 19 |
+
2. **Product-Key Addressing**: SLMs output addresses by predicting high byte (256 choices) + low byte (256 choices) = 65K addresses with only 512 logits
|
| 20 |
+
3. **Binary SLM Routing**: BLM selects which SLMs to trust via Straight-Through Sigmoid β hard `[1,0,1]` in forward, differentiable in backward
|
| 21 |
+
4. **Active Information Request**: BLM generates "what do I need next?" queries that modulate SLM memory search at the next timestep
|
| 22 |
+
5. **3-Phase Training**: Pre-train β Joint end-to-end β Info-request refinement with paired-branch reward
|
| 23 |
+
|
| 24 |
+
## Data Flow
|
| 25 |
+
|
| 26 |
+
```
|
| 27 |
+
βββββββββββββββββββββββββββββββ
|
| 28 |
+
β ARTIFICIAL MEMORY β
|
| 29 |
+
β [0][1][0][1]...[1][0][1][0] β
|
| 30 |
+
β 64K words Γ 32 bits each β
|
| 31 |
+
ββββββββββββ¬ββββββββββββββββββ-ββ
|
| 32 |
+
β READ(addr_range)
|
| 33 |
+
βββββββββββββββββββββΌββββββββββββββββββββ
|
| 34 |
+
ββββββββΌβββββββ ββββββββββΌββββββββ ββββββββΌβββββββββββ
|
| 35 |
+
β SLM-0 β β SLM-1 β β SLM-2 β
|
| 36 |
+
β (745K) β β (745K) β β (745K) β
|
| 37 |
+
β past_state β β past_state β β past_state β
|
| 38 |
+
β curr_state β β curr_state β β curr_state β
|
| 39 |
+
β character. β β character. β β character. β
|
| 40 |
+
β β addr β β β addr β β β addr β
|
| 41 |
+
ββββββββ¬βββββββ ββββββββββ¬ββββββββ ββββββββββ¬βββββββββ
|
| 42 |
+
β β β
|
| 43 |
+
ββββββββββββΊ BLM (11.2M) ββββββββββββββββ
|
| 44 |
+
mask = [1, 0, 1]
|
| 45 |
+
β next_state prediction
|
| 46 |
+
β "what info do I need next?"
|
| 47 |
+
```
|
| 48 |
+
|
| 49 |
+
## Files
|
| 50 |
+
|
| 51 |
+
| File | Description |
|
| 52 |
+
|------|-------------|
|
| 53 |
+
| `leworld_architecture.py` | All model definitions: Memory, SLM, BLM, full system (~990 lines) |
|
| 54 |
+
| `leworld_training.py` | 3-phase training pipeline, data generation, evaluation (~820 lines) |
|
| 55 |
+
| `PLAN.md` | Complete design document with literature references |
|
| 56 |
+
|
| 57 |
+
## Quick Start
|
| 58 |
+
|
| 59 |
+
```python
|
| 60 |
+
from leworld_architecture import LeWorldSystem, MemoryConfig, SLMConfig, BLMConfig
|
| 61 |
+
from leworld_training import run_training, TrainingConfig
|
| 62 |
+
|
| 63 |
+
# Build system
|
| 64 |
+
system = LeWorldSystem(MemoryConfig(), SLMConfig(), BLMConfig())
|
| 65 |
+
|
| 66 |
+
# Train (3 phases: pre-train β joint β refine)
|
| 67 |
+
metrics = run_training(system, TrainingConfig())
|
| 68 |
+
```
|
| 69 |
+
|
| 70 |
+
## Literature Foundation
|
| 71 |
+
|
| 72 |
+
| Paper | What we borrowed |
|
| 73 |
+
|-------|-----------------|
|
| 74 |
+
| [Gumbel-Softmax](https://arxiv.org/abs/1611.01144) | Straight-Through sigmoid for binary routing |
|
| 75 |
+
| [Switch Transformers](https://arxiv.org/abs/2101.03961) | Gate-value scaling, load balance loss |
|
| 76 |
+
| [Product Key Memory](https://arxiv.org/abs/1907.05242) | Address decomposition into sub-keys |
|
| 77 |
+
| [LM2](https://arxiv.org/abs/2502.06049) | LSTM-style memory gates |
|
| 78 |
+
| [NAMM](https://arxiv.org/abs/2410.13166) | Binary memory eviction |
|
| 79 |
+
| [ProactAgent](https://arxiv.org/abs/2604.20572) | Paired-branch reward for retrieval decisions |
|
| 80 |
+
| [Mamba](https://arxiv.org/abs/2312.00752) | Explicit state maintenance |
|
| 81 |
+
|
| 82 |
+
## Verified Results (demo run)
|
| 83 |
+
|
| 84 |
+
```
|
| 85 |
+
Phase 1: SLM loss 12.87 β 7.13, BLM loss 0.39 β 0.33
|
| 86 |
+
Phase 2: Routing becomes diverse β SLM usage: [0.72, 0.79, 0.67]
|
| 87 |
+
Phase 3: Info-request improves predictions by 19.5 loss units vs baseline
|
| 88 |
+
|
| 89 |
+
Final: MSE=0.36, Routing entropy=0.70
|
| 90 |
+
Per-step MSE: [0.64, 0.44, 0.31, 0.23, 0.19] β improves over time
|
| 91 |
+
Routing patterns: [1,0,1] β [0,1,1] β [1,1,1] β [1,1,0] β [0,1,0]
|
| 92 |
+
```
|