vibe_coding_router / README.md
darkolorin's picture
Update README for v5 router
e340d72 verified
metadata
license: apache-2.0
tags:
  - routing
  - code
  - mlx
  - pid
  - cascade
library_name: mlx

Vibe Coding Router v5

A three-tier cascaded router for coding tasks that routes prompts between:

  • Local: Qwen3-Coder-Next (80B/3B active MoE, on-device via MLX)
  • Sonnet: Claude Sonnet 4.6 (medium-complexity cloud)
  • Opus: Claude Opus 4.6 (max-capability cloud)

What's New in v5

v4 suffered from inverted routing — simple queries went to cloud while complex ones stayed local. Root cause: length-quality anti-correlation in training data combined with PID loss reward-weight amplification. v5 fixes this with:

  1. 7 new complexity features (45 handcrafted total): is_coding_task, junk_score, scope_breadth, imperative_verb_density, noun_phrase_density, interaction_complexity, requirement_clause_count
  2. Centered complexity premium: Adjusts training margins by premium * (complexity_score - center) so complex tasks push toward cloud and simple tasks push toward local
  3. Junk prompt clamping: 75 junk/greeting prompts neutralized (p_teacher=0.5, margin=0.0)
  4. Reward weight cap: PID loss reward_weight capped at 0.5 to prevent outlier margin dominance

Architecture

Two cascaded binary MLP routers trained with Privileged Information Distillation (PID):

  • Router A (local vs cloud): 77-dim → [32, 16] → 1, dropout=0.2, LayerNorm+ReLU
  • Router B (sonnet vs opus): 77-dim → [128, 64] → 1, dropout=0.0, LayerNorm+ReLU

Features: 45 handcrafted code features + 32 PCA-reduced sentence embeddings (all-MiniLM-L6-v2).

Training

  • Data: 1,644 coding prompts with real quality scores from all three models
  • Judge: GPT-5.4 scoring correctness, completeness, code quality, explanation
  • Loss: PID (reward-weighted CE + KL divergence), β_kl=0.02, reward_cap=0.5
  • Label smoothing: ε=0.05, cost-aware margin for Router B (cost_premium=0.03)
  • Complexity premium: 2.0, centered at 0.3
  • HP sweep: 108 configurations, 3-way split (1150 train / 247 val / 247 test)
  • Threshold A: 0.60 (manually tuned for routing behavior — see note below)
  • Threshold B: 0.474 (calibrated on validation set)

Threshold Note

The utility-optimal Router A threshold (0.01) routes almost nothing to local because cloud quality is genuinely equal or better on nearly all prompts. The manual threshold of 0.60 trades ~1.4% utility for correct routing intuition: simple/fast tasks run locally with zero latency, while complex tasks go to cloud.

Real-World Routing (28 test queries, threshold_a=0.60)

Category Local Sonnet Opus
Simple (8) 5 (62%) 0 3 (38%)
Medium (8) 3 (38%) 0 5 (62%)
Complex (6) 1 (17%) 1 (17%) 4 (67%)

v4 comparison: simple→local was 0/8 (now 5/8), complex→local was 6/6 (now 1/6).

Test Set Results (calibrated thresholds)

Metric Value
Utility 0.6205
Oracle Utility 0.7179
Regret 0.0973

Files

  • router_a.safetensors — Router A weights (32×16 MLP, 13KB)
  • router_b.safetensors — Router B weights (128×64 MLP, 76KB)
  • config.json — Model config, thresholds, HP, training results
  • scaler.pkl — StandardScaler for feature normalization
  • embedding_extractor.pkl — PCA-reduced sentence-transformers extractor
  • sweep_results.json — Full 108-config HP sweep results

Usage

from router.three_tier_inference import ThreeTierRouter

router = ThreeTierRouter("models/three_tier_v5")
result = router.route("Write a Python function to sort a list")
# result.decision: "local", "sonnet", or "opus"
# result.p_cloud: probability of cloud routing
# result.p_opus: probability of opus (if routed to cloud)