darkolorin commited on
Commit
e340d72
·
verified ·
1 Parent(s): 626e786

Update README for v5 router

Browse files
Files changed (1) hide show
  1. README.md +45 -30
README.md CHANGED
@@ -9,7 +9,7 @@ tags:
9
  library_name: mlx
10
  ---
11
 
12
- # Vibe Coding Router v4
13
 
14
  A three-tier cascaded router for coding tasks that routes prompts between:
15
 
@@ -17,59 +17,74 @@ A three-tier cascaded router for coding tasks that routes prompts between:
17
  - **Sonnet**: Claude Sonnet 4.6 (medium-complexity cloud)
18
  - **Opus**: Claude Opus 4.6 (max-capability cloud)
19
 
 
 
 
 
 
 
 
 
 
20
  ## Architecture
21
 
22
  Two cascaded binary MLP routers trained with **Privileged Information Distillation (PID)**:
23
 
24
- - **Router A** (local vs cloud): 70-dim input -> [64, 32] -> 1, dropout=0.2
25
- - **Router B** (sonnet vs opus): 70-dim input -> [32, 16] -> 1, dropout=0.0
26
 
27
- Features: 38 handcrafted code features + 32 PCA-reduced sentence embeddings (all-MiniLM-L6-v2).
28
 
29
  ## Training
30
 
31
  - **Data**: 1,644 coding prompts with real quality scores from all three models
32
  - **Judge**: GPT-5.4 scoring correctness, completeness, code quality, explanation
33
- - **Loss**: PID (reward-weighted CE + KL divergence), β_kl=0.02
34
- - **Label smoothing**: epsilon=0.05, cost-aware margin for Router B (cost_premium=0.03)
 
35
  - **HP sweep**: 108 configurations, 3-way split (1150 train / 247 val / 247 test)
36
- - **Thresholds**: calibrated on validation set only
 
37
 
38
- ## Test Set Results
39
 
40
- | Metric | Value |
41
- |--------|-------|
42
- | Utility | 0.6349 |
43
- | Regret | 0.0830 |
44
- | vs Always-Opus | +0.63% utility, 40.9% cost savings |
45
 
46
- ## Routing Distribution (test set)
 
 
 
 
47
 
48
- | Tier | Rate | Use Case |
49
- |------|------|----------|
50
- | Local | 19.4% | Simple tasks, explanations, basic code gen |
51
- | Sonnet | 21.5% | Medium complexity, standard debugging |
52
- | Opus | 59.1% | Architecture, complex multi-file tasks |
53
 
54
- ## Thresholds
55
 
56
- - Router A: 0.474 (p(cloud) >= threshold -> route to cloud)
57
- - Router B: 0.474 (p(opus) >= threshold -> route to Opus, else Sonnet)
 
 
 
58
 
59
  ## Files
60
 
61
- - `router_a.safetensors` - Router A weights (64x32 MLP)
62
- - `router_b.safetensors` - Router B weights (32x16 MLP)
63
- - `config.json` - Model config, thresholds, training results
64
- - `scaler.pkl` - StandardScaler for feature normalization
65
- - `embedding_extractor.pkl` - PCA-reduced sentence-transformers extractor
 
66
 
67
  ## Usage
68
 
69
  ```python
70
  from router.three_tier_inference import ThreeTierRouter
71
 
72
- router = ThreeTierRouter("models/three_tier_v4")
73
- tier, probs = router.route("Write a Python function to sort a list")
74
- # tier: "local", "sonnet", or "opus"
 
 
75
  ```
 
9
  library_name: mlx
10
  ---
11
 
12
+ # Vibe Coding Router v5
13
 
14
  A three-tier cascaded router for coding tasks that routes prompts between:
15
 
 
17
  - **Sonnet**: Claude Sonnet 4.6 (medium-complexity cloud)
18
  - **Opus**: Claude Opus 4.6 (max-capability cloud)
19
 
20
+ ## What's New in v5
21
+
22
+ v4 suffered from **inverted routing** — simple queries went to cloud while complex ones stayed local. Root cause: length-quality anti-correlation in training data combined with PID loss reward-weight amplification. v5 fixes this with:
23
+
24
+ 1. **7 new complexity features** (45 handcrafted total): `is_coding_task`, `junk_score`, `scope_breadth`, `imperative_verb_density`, `noun_phrase_density`, `interaction_complexity`, `requirement_clause_count`
25
+ 2. **Centered complexity premium**: Adjusts training margins by `premium * (complexity_score - center)` so complex tasks push toward cloud and simple tasks push toward local
26
+ 3. **Junk prompt clamping**: 75 junk/greeting prompts neutralized (p_teacher=0.5, margin=0.0)
27
+ 4. **Reward weight cap**: PID loss reward_weight capped at 0.5 to prevent outlier margin dominance
28
+
29
  ## Architecture
30
 
31
  Two cascaded binary MLP routers trained with **Privileged Information Distillation (PID)**:
32
 
33
+ - **Router A** (local vs cloud): 77-dim [32, 16] 1, dropout=0.2, LayerNorm+ReLU
34
+ - **Router B** (sonnet vs opus): 77-dim [128, 64] 1, dropout=0.0, LayerNorm+ReLU
35
 
36
+ Features: 45 handcrafted code features + 32 PCA-reduced sentence embeddings (all-MiniLM-L6-v2).
37
 
38
  ## Training
39
 
40
  - **Data**: 1,644 coding prompts with real quality scores from all three models
41
  - **Judge**: GPT-5.4 scoring correctness, completeness, code quality, explanation
42
+ - **Loss**: PID (reward-weighted CE + KL divergence), β_kl=0.02, reward_cap=0.5
43
+ - **Label smoothing**: ε=0.05, cost-aware margin for Router B (cost_premium=0.03)
44
+ - **Complexity premium**: 2.0, centered at 0.3
45
  - **HP sweep**: 108 configurations, 3-way split (1150 train / 247 val / 247 test)
46
+ - **Threshold A**: 0.60 (manually tuned for routing behavior — see note below)
47
+ - **Threshold B**: 0.474 (calibrated on validation set)
48
 
49
+ ### Threshold Note
50
 
51
+ The utility-optimal Router A threshold (0.01) routes almost nothing to local because cloud quality is genuinely equal or better on nearly all prompts. The manual threshold of 0.60 trades ~1.4% utility for correct routing intuition: simple/fast tasks run locally with zero latency, while complex tasks go to cloud.
52
+
53
+ ## Real-World Routing (28 test queries, threshold_a=0.60)
 
 
54
 
55
+ | Category | Local | Sonnet | Opus |
56
+ |----------|-------|--------|------|
57
+ | Simple (8) | 5 (62%) | 0 | 3 (38%) |
58
+ | Medium (8) | 3 (38%) | 0 | 5 (62%) |
59
+ | Complex (6) | 1 (17%) | 1 (17%) | 4 (67%) |
60
 
61
+ v4 comparison: simple→local was 0/8 (now 5/8), complex→local was 6/6 (now 1/6).
 
 
 
 
62
 
63
+ ## Test Set Results (calibrated thresholds)
64
 
65
+ | Metric | Value |
66
+ |--------|-------|
67
+ | Utility | 0.6205 |
68
+ | Oracle Utility | 0.7179 |
69
+ | Regret | 0.0973 |
70
 
71
  ## Files
72
 
73
+ - `router_a.safetensors` Router A weights (32×16 MLP, 13KB)
74
+ - `router_b.safetensors` Router B weights (128×64 MLP, 76KB)
75
+ - `config.json` Model config, thresholds, HP, training results
76
+ - `scaler.pkl` StandardScaler for feature normalization
77
+ - `embedding_extractor.pkl` PCA-reduced sentence-transformers extractor
78
+ - `sweep_results.json` — Full 108-config HP sweep results
79
 
80
  ## Usage
81
 
82
  ```python
83
  from router.three_tier_inference import ThreeTierRouter
84
 
85
+ router = ThreeTierRouter("models/three_tier_v5")
86
+ result = router.route("Write a Python function to sort a list")
87
+ # result.decision: "local", "sonnet", or "opus"
88
+ # result.p_cloud: probability of cloud routing
89
+ # result.p_opus: probability of opus (if routed to cloud)
90
  ```