Update README for v5 router
Browse files
README.md
CHANGED
|
@@ -9,7 +9,7 @@ tags:
|
|
| 9 |
library_name: mlx
|
| 10 |
---
|
| 11 |
|
| 12 |
-
# Vibe Coding Router
|
| 13 |
|
| 14 |
A three-tier cascaded router for coding tasks that routes prompts between:
|
| 15 |
|
|
@@ -17,59 +17,74 @@ A three-tier cascaded router for coding tasks that routes prompts between:
|
|
| 17 |
- **Sonnet**: Claude Sonnet 4.6 (medium-complexity cloud)
|
| 18 |
- **Opus**: Claude Opus 4.6 (max-capability cloud)
|
| 19 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
## Architecture
|
| 21 |
|
| 22 |
Two cascaded binary MLP routers trained with **Privileged Information Distillation (PID)**:
|
| 23 |
|
| 24 |
-
- **Router A** (local vs cloud):
|
| 25 |
-
- **Router B** (sonnet vs opus):
|
| 26 |
|
| 27 |
-
Features:
|
| 28 |
|
| 29 |
## Training
|
| 30 |
|
| 31 |
- **Data**: 1,644 coding prompts with real quality scores from all three models
|
| 32 |
- **Judge**: GPT-5.4 scoring correctness, completeness, code quality, explanation
|
| 33 |
-
- **Loss**: PID (reward-weighted CE + KL divergence), β_kl=0.02
|
| 34 |
-
- **Label smoothing**:
|
|
|
|
| 35 |
- **HP sweep**: 108 configurations, 3-way split (1150 train / 247 val / 247 test)
|
| 36 |
-
- **
|
|
|
|
| 37 |
|
| 38 |
-
##
|
| 39 |
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
| Regret | 0.0830 |
|
| 44 |
-
| vs Always-Opus | +0.63% utility, 40.9% cost savings |
|
| 45 |
|
| 46 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 47 |
|
| 48 |
-
|
| 49 |
-
|------|------|----------|
|
| 50 |
-
| Local | 19.4% | Simple tasks, explanations, basic code gen |
|
| 51 |
-
| Sonnet | 21.5% | Medium complexity, standard debugging |
|
| 52 |
-
| Opus | 59.1% | Architecture, complex multi-file tasks |
|
| 53 |
|
| 54 |
-
##
|
| 55 |
|
| 56 |
-
|
| 57 |
-
-
|
|
|
|
|
|
|
|
|
|
| 58 |
|
| 59 |
## Files
|
| 60 |
|
| 61 |
-
- `router_a.safetensors`
|
| 62 |
-
- `router_b.safetensors`
|
| 63 |
-
- `config.json`
|
| 64 |
-
- `scaler.pkl`
|
| 65 |
-
- `embedding_extractor.pkl`
|
|
|
|
| 66 |
|
| 67 |
## Usage
|
| 68 |
|
| 69 |
```python
|
| 70 |
from router.three_tier_inference import ThreeTierRouter
|
| 71 |
|
| 72 |
-
router = ThreeTierRouter("models/
|
| 73 |
-
|
| 74 |
-
#
|
|
|
|
|
|
|
| 75 |
```
|
|
|
|
| 9 |
library_name: mlx
|
| 10 |
---
|
| 11 |
|
| 12 |
+
# Vibe Coding Router v5
|
| 13 |
|
| 14 |
A three-tier cascaded router for coding tasks that routes prompts between:
|
| 15 |
|
|
|
|
| 17 |
- **Sonnet**: Claude Sonnet 4.6 (medium-complexity cloud)
|
| 18 |
- **Opus**: Claude Opus 4.6 (max-capability cloud)
|
| 19 |
|
| 20 |
+
## What's New in v5
|
| 21 |
+
|
| 22 |
+
v4 suffered from **inverted routing** — simple queries went to cloud while complex ones stayed local. Root cause: length-quality anti-correlation in training data combined with PID loss reward-weight amplification. v5 fixes this with:
|
| 23 |
+
|
| 24 |
+
1. **7 new complexity features** (45 handcrafted total): `is_coding_task`, `junk_score`, `scope_breadth`, `imperative_verb_density`, `noun_phrase_density`, `interaction_complexity`, `requirement_clause_count`
|
| 25 |
+
2. **Centered complexity premium**: Adjusts training margins by `premium * (complexity_score - center)` so complex tasks push toward cloud and simple tasks push toward local
|
| 26 |
+
3. **Junk prompt clamping**: 75 junk/greeting prompts neutralized (p_teacher=0.5, margin=0.0)
|
| 27 |
+
4. **Reward weight cap**: PID loss reward_weight capped at 0.5 to prevent outlier margin dominance
|
| 28 |
+
|
| 29 |
## Architecture
|
| 30 |
|
| 31 |
Two cascaded binary MLP routers trained with **Privileged Information Distillation (PID)**:
|
| 32 |
|
| 33 |
+
- **Router A** (local vs cloud): 77-dim → [32, 16] → 1, dropout=0.2, LayerNorm+ReLU
|
| 34 |
+
- **Router B** (sonnet vs opus): 77-dim → [128, 64] → 1, dropout=0.0, LayerNorm+ReLU
|
| 35 |
|
| 36 |
+
Features: 45 handcrafted code features + 32 PCA-reduced sentence embeddings (all-MiniLM-L6-v2).
|
| 37 |
|
| 38 |
## Training
|
| 39 |
|
| 40 |
- **Data**: 1,644 coding prompts with real quality scores from all three models
|
| 41 |
- **Judge**: GPT-5.4 scoring correctness, completeness, code quality, explanation
|
| 42 |
+
- **Loss**: PID (reward-weighted CE + KL divergence), β_kl=0.02, reward_cap=0.5
|
| 43 |
+
- **Label smoothing**: ε=0.05, cost-aware margin for Router B (cost_premium=0.03)
|
| 44 |
+
- **Complexity premium**: 2.0, centered at 0.3
|
| 45 |
- **HP sweep**: 108 configurations, 3-way split (1150 train / 247 val / 247 test)
|
| 46 |
+
- **Threshold A**: 0.60 (manually tuned for routing behavior — see note below)
|
| 47 |
+
- **Threshold B**: 0.474 (calibrated on validation set)
|
| 48 |
|
| 49 |
+
### Threshold Note
|
| 50 |
|
| 51 |
+
The utility-optimal Router A threshold (0.01) routes almost nothing to local because cloud quality is genuinely equal or better on nearly all prompts. The manual threshold of 0.60 trades ~1.4% utility for correct routing intuition: simple/fast tasks run locally with zero latency, while complex tasks go to cloud.
|
| 52 |
+
|
| 53 |
+
## Real-World Routing (28 test queries, threshold_a=0.60)
|
|
|
|
|
|
|
| 54 |
|
| 55 |
+
| Category | Local | Sonnet | Opus |
|
| 56 |
+
|----------|-------|--------|------|
|
| 57 |
+
| Simple (8) | 5 (62%) | 0 | 3 (38%) |
|
| 58 |
+
| Medium (8) | 3 (38%) | 0 | 5 (62%) |
|
| 59 |
+
| Complex (6) | 1 (17%) | 1 (17%) | 4 (67%) |
|
| 60 |
|
| 61 |
+
v4 comparison: simple→local was 0/8 (now 5/8), complex→local was 6/6 (now 1/6).
|
|
|
|
|
|
|
|
|
|
|
|
|
| 62 |
|
| 63 |
+
## Test Set Results (calibrated thresholds)
|
| 64 |
|
| 65 |
+
| Metric | Value |
|
| 66 |
+
|--------|-------|
|
| 67 |
+
| Utility | 0.6205 |
|
| 68 |
+
| Oracle Utility | 0.7179 |
|
| 69 |
+
| Regret | 0.0973 |
|
| 70 |
|
| 71 |
## Files
|
| 72 |
|
| 73 |
+
- `router_a.safetensors` — Router A weights (32×16 MLP, 13KB)
|
| 74 |
+
- `router_b.safetensors` — Router B weights (128×64 MLP, 76KB)
|
| 75 |
+
- `config.json` — Model config, thresholds, HP, training results
|
| 76 |
+
- `scaler.pkl` — StandardScaler for feature normalization
|
| 77 |
+
- `embedding_extractor.pkl` — PCA-reduced sentence-transformers extractor
|
| 78 |
+
- `sweep_results.json` — Full 108-config HP sweep results
|
| 79 |
|
| 80 |
## Usage
|
| 81 |
|
| 82 |
```python
|
| 83 |
from router.three_tier_inference import ThreeTierRouter
|
| 84 |
|
| 85 |
+
router = ThreeTierRouter("models/three_tier_v5")
|
| 86 |
+
result = router.route("Write a Python function to sort a list")
|
| 87 |
+
# result.decision: "local", "sonnet", or "opus"
|
| 88 |
+
# result.p_cloud: probability of cloud routing
|
| 89 |
+
# result.p_opus: probability of opus (if routed to cloud)
|
| 90 |
```
|