inv0krr commited on
Commit
2f44c12
Β·
verified Β·
1 Parent(s): 9585a45

Add README

Browse files
Files changed (1) hide show
  1. README.md +92 -0
README.md ADDED
@@ -0,0 +1,92 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # LeWorld Memory Architecture 🧠⚑
2
+
3
+ A CPU-inspired hierarchical neural architecture where **3 Small LeWorld Models (SLMs)** compete to find the most useful memory for **1 Big LeWorld Model (BLM)** to predict the next world state.
4
+
5
+ ## Architecture
6
+
7
+ | Component | Parameters | Role |
8
+ |-----------|-----------|------|
9
+ | **Artificial Memory** | 21K | Bit-level storage (64K words Γ— 32 bits) + learned bit encoder/decoder |
10
+ | **SLM-0** | 745K | State β†’ memory address range |
11
+ | **SLM-1** | 745K | State β†’ memory address range |
12
+ | **SLM-2** | 745K | State β†’ memory address range |
13
+ | **BLM** | 11.2M | SLM selector `[1,0,1]` + next-state predictor + info requester |
14
+ | **Total** | **13.5M** | |
15
+
16
+ ## Key Ideas
17
+
18
+ 1. **CPU-Style Memory**: Actual bit-level storage (64K Γ— 32-bit words), accessed by address ranges β€” just like RAM
19
+ 2. **Product-Key Addressing**: SLMs output addresses by predicting high byte (256 choices) + low byte (256 choices) = 65K addresses with only 512 logits
20
+ 3. **Binary SLM Routing**: BLM selects which SLMs to trust via Straight-Through Sigmoid β†’ hard `[1,0,1]` in forward, differentiable in backward
21
+ 4. **Active Information Request**: BLM generates "what do I need next?" queries that modulate SLM memory search at the next timestep
22
+ 5. **3-Phase Training**: Pre-train β†’ Joint end-to-end β†’ Info-request refinement with paired-branch reward
23
+
24
+ ## Data Flow
25
+
26
+ ```
27
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
28
+ β”‚ ARTIFICIAL MEMORY β”‚
29
+ β”‚ [0][1][0][1]...[1][0][1][0] β”‚
30
+ β”‚ 64K words Γ— 32 bits each β”‚
31
+ └──────────┬──────────────────-β”€β”˜
32
+ β”‚ READ(addr_range)
33
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
34
+ β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
35
+ β”‚ SLM-0 β”‚ β”‚ SLM-1 β”‚ β”‚ SLM-2 β”‚
36
+ β”‚ (745K) β”‚ β”‚ (745K) β”‚ β”‚ (745K) β”‚
37
+ β”‚ past_state β”‚ β”‚ past_state β”‚ β”‚ past_state β”‚
38
+ β”‚ curr_state β”‚ β”‚ curr_state β”‚ β”‚ curr_state β”‚
39
+ β”‚ character. β”‚ β”‚ character. β”‚ β”‚ character. β”‚
40
+ β”‚ β†’ addr β”‚ β”‚ β†’ addr β”‚ β”‚ β†’ addr β”‚
41
+ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
42
+ β”‚ β”‚ β”‚
43
+ └──────────► BLM (11.2M) β—„β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
44
+ mask = [1, 0, 1]
45
+ β†’ next_state prediction
46
+ β†’ "what info do I need next?"
47
+ ```
48
+
49
+ ## Files
50
+
51
+ | File | Description |
52
+ |------|-------------|
53
+ | `leworld_architecture.py` | All model definitions: Memory, SLM, BLM, full system (~990 lines) |
54
+ | `leworld_training.py` | 3-phase training pipeline, data generation, evaluation (~820 lines) |
55
+ | `PLAN.md` | Complete design document with literature references |
56
+
57
+ ## Quick Start
58
+
59
+ ```python
60
+ from leworld_architecture import LeWorldSystem, MemoryConfig, SLMConfig, BLMConfig
61
+ from leworld_training import run_training, TrainingConfig
62
+
63
+ # Build system
64
+ system = LeWorldSystem(MemoryConfig(), SLMConfig(), BLMConfig())
65
+
66
+ # Train (3 phases: pre-train β†’ joint β†’ refine)
67
+ metrics = run_training(system, TrainingConfig())
68
+ ```
69
+
70
+ ## Literature Foundation
71
+
72
+ | Paper | What we borrowed |
73
+ |-------|-----------------|
74
+ | [Gumbel-Softmax](https://arxiv.org/abs/1611.01144) | Straight-Through sigmoid for binary routing |
75
+ | [Switch Transformers](https://arxiv.org/abs/2101.03961) | Gate-value scaling, load balance loss |
76
+ | [Product Key Memory](https://arxiv.org/abs/1907.05242) | Address decomposition into sub-keys |
77
+ | [LM2](https://arxiv.org/abs/2502.06049) | LSTM-style memory gates |
78
+ | [NAMM](https://arxiv.org/abs/2410.13166) | Binary memory eviction |
79
+ | [ProactAgent](https://arxiv.org/abs/2604.20572) | Paired-branch reward for retrieval decisions |
80
+ | [Mamba](https://arxiv.org/abs/2312.00752) | Explicit state maintenance |
81
+
82
+ ## Verified Results (demo run)
83
+
84
+ ```
85
+ Phase 1: SLM loss 12.87 β†’ 7.13, BLM loss 0.39 β†’ 0.33
86
+ Phase 2: Routing becomes diverse β€” SLM usage: [0.72, 0.79, 0.67]
87
+ Phase 3: Info-request improves predictions by 19.5 loss units vs baseline
88
+
89
+ Final: MSE=0.36, Routing entropy=0.70
90
+ Per-step MSE: [0.64, 0.44, 0.31, 0.23, 0.19] ← improves over time
91
+ Routing patterns: [1,0,1] β†’ [0,1,1] β†’ [1,1,1] β†’ [1,1,0] β†’ [0,1,0]
92
+ ```