Spaces:
Running
v0.8.6 NIAH RULER calibration — anti-bullshit pack #12
Browse filesThe 🔍 NIAH→Reason mode predicted pass rates from architectural
inputs (γ_Padé, d_horizon, GQA pressure, SWA boundary) using a
heuristic logistic. That heuristic was calibrated against rough
RULER bands but never validated against per-model-per-context
ground truth. Anti-bullshit principle: if measured data exists,
USE the measured data.
Layered RULER calibration on top of the existing predictor:
- NEW data/ruler_kb.json — 12 models from RULER paper Table 3 +
DeepWiki leaderboard. Each row carries 4K/8K/16K/32K/64K/128K
aggregate scores, claimed-vs-effective context, params, and
multi-name aliases (org/name + bare-name + unsloth mirrors so
autocomplete-pasted ids hit). Models: GPT-4-1106, Command-R-35B,
Yi-34B-200K, Mixtral-8x7B/-8x22B, Mistral-7B-v0.2, ChatGLM3-6B,
LWM-7B, Llama-3.1-70B-Instruct, Gemini-1.5-Pro, Jamba-1.5-Large,
Qwen2.5-14B-1M.
- `loadRulerKB()` + `lookupRulerModel()` + `calibrateNIAH()` in
niah_reasoning.js. Lookup is case-insensitive on the alias_index
with org/name + bare-name + lowercase fallback. Linear-interpolate
in log-context between bracketing samples; clamp + flag
extrapolation outside the 4K-128K measured range.
- Per-task back-out: RULER aggregate × retrieval_factor (1.04) for
NIAH, × reasoning_factor (0.78) for multi-hop QA. Factors derived
from RULER paper Appendix Tables 13-16 (top-tier models score
retrieval 95-100%, QA ~70%, the canonical ~25pp gap). Honest
range surfaced inline (retrieval 0.95-1.10×, reasoning 0.60-0.85×).
- UI: green-bordered "📊 RULER-calibrated" panel above the existing
architecture breakdown when the model id matches the KB. Shows
measured RULER aggregate at T_eval, derived NIAH/reasoning rates,
and a side-by-side delta vs the heuristic prediction (color-coded
+/- pp). Extrapolation warning for T_eval > 128K. Source citation
links to the paper. KB-miss path shows an explicit "calibration
unavailable, heuristic only" hint.
- 17 i18n keys × 4 langs (EN/ES/FR/ZH) = 68 keys.
Verified locally: 12 models load, alias lookup hits HF org/name +
bare names + lowercase variants, calibration arithmetic produces
sensible numbers (Llama-3.1-70B @ 32K → RULER 94.8% → NIAH 99%
reasoning 74%; @ 128K → 66.6% → 69% / 52% — the canonical RULER
"reasoning collapses at long context" finding).
Refs:
- https://arxiv.org/abs/2404.06654 (Hsieh et al., COLM 2024)
- https://github.com/NVIDIA/RULER
Closes the v0.8 roadmap #5 commitment ("RULER-backed NIAH calibrator
to upgrade the predictor from heuristic to calibrated").
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- data/ruler_kb.json +116 -0
- js/i18n.js +60 -0
- js/main.js +82 -3
- js/niah_reasoning.js +145 -0
|
@@ -0,0 +1,116 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"version": "1.0",
|
| 3 |
+
"compiled": "2026-05-07",
|
| 4 |
+
"source": {
|
| 5 |
+
"primary": "RULER paper Table 3 (Hsieh et al., COLM 2024) — arxiv.org/abs/2404.06654",
|
| 6 |
+
"secondary": "DeepWiki/NVIDIA/RULER aggregated leaderboard (~34 models)",
|
| 7 |
+
"license": "Numbers reproduced for review/calibration; see paper for evaluation methodology."
|
| 8 |
+
},
|
| 9 |
+
"task_breakdown_priors": {
|
| 10 |
+
"comment": "RULER aggregate score is the mean of 13 tasks across 4 categories. Per the paper's Appendix Tables 13-16, top-tier models (GPT-4) score: retrieval 95-100%, variable-tracking 99%, aggregation 93%, QA 70%. The ~25pp gap between retrieval and QA is the canonical 'retrieval-vs-reasoning' finding. We use these ratios to back out per-task estimates from the published aggregate.",
|
| 11 |
+
"retrieval_factor": 1.04,
|
| 12 |
+
"reasoning_factor": 0.78,
|
| 13 |
+
"retrieval_factor_caveat": "NIAH-single typically scores 95-99% for top models even when aggregate drops; this multiplier underestimates NIAH on small-model regimes. Honest range: 0.95×–1.10× aggregate.",
|
| 14 |
+
"reasoning_factor_caveat": "Multi-hop QA degrades faster than aggregate. At long context (>64K) the ratio can drop below 0.6×. Honest range: 0.60×–0.85× aggregate."
|
| 15 |
+
},
|
| 16 |
+
"models": {
|
| 17 |
+
"gpt-4-1106-preview": {
|
| 18 |
+
"ruler_avg": {"4k": 96.6, "8k": 96.3, "16k": 95.2, "32k": 93.2, "64k": 87.0, "128k": 81.2},
|
| 19 |
+
"claimed_context": 128000,
|
| 20 |
+
"effective_context": 64000,
|
| 21 |
+
"params_b": null,
|
| 22 |
+
"id_aliases": ["openai/gpt-4-1106-preview", "gpt-4-1106-preview", "gpt-4-turbo"],
|
| 23 |
+
"category": "frontier_api"
|
| 24 |
+
},
|
| 25 |
+
"command-r-35b": {
|
| 26 |
+
"ruler_avg": {"4k": 93.8, "8k": 93.3, "16k": 92.4, "32k": 89.5, "64k": 84.9, "128k": 76.0},
|
| 27 |
+
"claimed_context": 128000,
|
| 28 |
+
"effective_context": 32000,
|
| 29 |
+
"params_b": 35,
|
| 30 |
+
"id_aliases": ["CohereForAI/c4ai-command-r-v01", "command-r-35b", "c4ai-command-r-v01"],
|
| 31 |
+
"category": "open"
|
| 32 |
+
},
|
| 33 |
+
"yi-34b-200k": {
|
| 34 |
+
"ruler_avg": {"4k": 93.3, "8k": 92.2, "16k": 91.3, "32k": 87.5, "64k": 83.2, "128k": 77.3},
|
| 35 |
+
"claimed_context": 200000,
|
| 36 |
+
"effective_context": 32000,
|
| 37 |
+
"params_b": 34,
|
| 38 |
+
"id_aliases": ["01-ai/Yi-34B-200K", "yi-34b-200k", "Yi-34B-200K"],
|
| 39 |
+
"category": "open"
|
| 40 |
+
},
|
| 41 |
+
"mixtral-8x7b": {
|
| 42 |
+
"ruler_avg": {"4k": 94.9, "8k": 92.1, "16k": 92.5, "32k": 85.9, "64k": 72.4, "128k": 44.5},
|
| 43 |
+
"claimed_context": 32000,
|
| 44 |
+
"effective_context": 32000,
|
| 45 |
+
"params_b": 47,
|
| 46 |
+
"id_aliases": ["mistralai/Mixtral-8x7B-Instruct-v0.1", "mixtral-8x7b-instruct", "Mixtral-8x7B-Instruct-v0.1"],
|
| 47 |
+
"category": "open_moe"
|
| 48 |
+
},
|
| 49 |
+
"mistral-7b-v0.2": {
|
| 50 |
+
"ruler_avg": {"4k": 93.6, "8k": 91.2, "16k": 87.2, "32k": 75.4, "64k": 49.0, "128k": 13.8},
|
| 51 |
+
"claimed_context": 32000,
|
| 52 |
+
"effective_context": 16000,
|
| 53 |
+
"params_b": 7,
|
| 54 |
+
"id_aliases": ["mistralai/Mistral-7B-Instruct-v0.2", "mistral-7b-v0.2", "Mistral-7B-Instruct-v0.2"],
|
| 55 |
+
"category": "open"
|
| 56 |
+
},
|
| 57 |
+
"chatglm3-6b-128k": {
|
| 58 |
+
"ruler_avg": {"4k": 87.8, "8k": 83.4, "16k": 78.6, "32k": 69.9, "64k": 56.0, "128k": 42.0},
|
| 59 |
+
"claimed_context": 128000,
|
| 60 |
+
"effective_context": 4000,
|
| 61 |
+
"params_b": 6,
|
| 62 |
+
"id_aliases": ["THUDM/chatglm3-6b-128k", "chatglm3-6b-128k"],
|
| 63 |
+
"category": "open"
|
| 64 |
+
},
|
| 65 |
+
"lwm-7b": {
|
| 66 |
+
"ruler_avg": {"4k": 82.3, "8k": 78.4, "16k": 73.7, "32k": 69.1, "64k": 68.1, "128k": 65.0},
|
| 67 |
+
"claimed_context": 1000000,
|
| 68 |
+
"effective_context": 4000,
|
| 69 |
+
"params_b": 7,
|
| 70 |
+
"id_aliases": ["LargeWorldModel/LWM-Text-Chat-1M", "lwm-7b", "LWM-Text-Chat-1M"],
|
| 71 |
+
"category": "open"
|
| 72 |
+
},
|
| 73 |
+
"llama3.1-70b-instruct": {
|
| 74 |
+
"ruler_avg": {"4k": 96.5, "8k": 95.8, "16k": 95.4, "32k": 94.8, "64k": 88.4, "128k": 66.6},
|
| 75 |
+
"claimed_context": 128000,
|
| 76 |
+
"effective_context": 64000,
|
| 77 |
+
"params_b": 70,
|
| 78 |
+
"id_aliases": ["meta-llama/Llama-3.1-70B-Instruct", "llama-3.1-70b", "llama3.1-70b-instruct", "Meta-Llama-3.1-70B-Instruct", "Llama-3.1-70B-Instruct", "unsloth/Meta-Llama-3.1-70B-Instruct", "unsloth/Llama-3.1-70B-Instruct"],
|
| 79 |
+
"category": "open_frontier"
|
| 80 |
+
},
|
| 81 |
+
"mixtral-8x22b-instruct": {
|
| 82 |
+
"ruler_avg": {"4k": 95.6, "8k": 94.9, "16k": 93.4, "32k": 90.9, "64k": 84.7, "128k": 31.7},
|
| 83 |
+
"claimed_context": 64000,
|
| 84 |
+
"effective_context": 32000,
|
| 85 |
+
"params_b": 141,
|
| 86 |
+
"id_aliases": ["mistralai/Mixtral-8x22B-Instruct-v0.1", "mixtral-8x22b-instruct", "Mixtral-8x22B-Instruct-v0.1"],
|
| 87 |
+
"category": "open_moe"
|
| 88 |
+
},
|
| 89 |
+
"gemini-1.5-pro": {
|
| 90 |
+
"ruler_avg": {"4k": 96.7, "8k": 95.8, "16k": 96.0, "32k": 95.9, "64k": 95.9, "128k": 94.4},
|
| 91 |
+
"claimed_context": 1000000,
|
| 92 |
+
"effective_context": 128000,
|
| 93 |
+
"params_b": null,
|
| 94 |
+
"id_aliases": ["google/gemini-1.5-pro", "gemini-1.5-pro"],
|
| 95 |
+
"category": "frontier_api"
|
| 96 |
+
},
|
| 97 |
+
"jamba-1.5-large": {
|
| 98 |
+
"ruler_avg": {"4k": 96.3, "8k": 96.2, "16k": 96.0, "32k": 96.0, "64k": 96.0, "128k": 96.0},
|
| 99 |
+
"claimed_context": 256000,
|
| 100 |
+
"effective_context": 128000,
|
| 101 |
+
"params_b": 94,
|
| 102 |
+
"id_aliases": ["ai21labs/AI21-Jamba-1.5-Large", "jamba-1.5-large", "AI21-Jamba-1.5-Large"],
|
| 103 |
+
"category": "open_hybrid_ssm"
|
| 104 |
+
},
|
| 105 |
+
"qwen2.5-14b-instruct-1m": {
|
| 106 |
+
"ruler_avg": {"4k": 97.5, "8k": 96.8, "16k": 95.5, "32k": 94.0, "64k": 92.5, "128k": 92.2},
|
| 107 |
+
"claimed_context": 1000000,
|
| 108 |
+
"effective_context": 128000,
|
| 109 |
+
"params_b": 14,
|
| 110 |
+
"id_aliases": ["Qwen/Qwen2.5-14B-Instruct-1M", "qwen2.5-14b-1m", "Qwen2.5-14B-Instruct-1M", "qwen2.5-14b-instruct-1m"],
|
| 111 |
+
"category": "open_frontier"
|
| 112 |
+
}
|
| 113 |
+
},
|
| 114 |
+
"context_levels": [4096, 8192, 16384, 32768, 65536, 131072],
|
| 115 |
+
"context_level_keys": ["4k", "8k", "16k", "32k", "64k", "128k"]
|
| 116 |
+
}
|
|
@@ -439,6 +439,21 @@ export const TRANSLATIONS = {
|
|
| 439 |
"niah.label.safe_ctx": "Safe reasoning context",
|
| 440 |
"niah.section.breakdown": "Architecture breakdown",
|
| 441 |
"niah.section.reco": "Recommendation",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 442 |
"niah.section.sweep": "Pass rate sweep across context lengths",
|
| 443 |
"niah.field.dhorizon": "d_horizon (effective)",
|
| 444 |
"niah.field.ratio": "T_eval / d_horizon",
|
|
@@ -1597,6 +1612,21 @@ export const TRANSLATIONS = {
|
|
| 1597 |
"niah.label.safe_ctx": "Contexto seguro de reasoning",
|
| 1598 |
"niah.section.breakdown": "Desglose arquitectónico",
|
| 1599 |
"niah.section.reco": "Recomendación",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1600 |
"niah.section.sweep": "Barrido de tasas pass por longitud de contexto",
|
| 1601 |
"niah.field.dhorizon": "d_horizon (efectivo)",
|
| 1602 |
"niah.field.ratio": "T_eval / d_horizon",
|
|
@@ -2619,6 +2649,21 @@ export const TRANSLATIONS = {
|
|
| 2619 |
"niah.label.safe_ctx": "Contexte sûr pour reasoning",
|
| 2620 |
"niah.section.breakdown": "Détail architectural",
|
| 2621 |
"niah.section.reco": "Recommandation",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2622 |
"niah.section.sweep": "Balayage des taux par longueur de contexte",
|
| 2623 |
"niah.field.dhorizon": "d_horizon (effectif)",
|
| 2624 |
"niah.field.ratio": "T_eval / d_horizon",
|
|
@@ -3641,6 +3686,21 @@ export const TRANSLATIONS = {
|
|
| 3641 |
"niah.label.safe_ctx": "Reasoning 安全上下文",
|
| 3642 |
"niah.section.breakdown": "架构细节",
|
| 3643 |
"niah.section.reco": "建议",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3644 |
"niah.section.sweep": "按上下文长度扫描通过率",
|
| 3645 |
"niah.field.dhorizon": "d_horizon(有效)",
|
| 3646 |
"niah.field.ratio": "T_eval / d_horizon",
|
|
|
|
| 439 |
"niah.label.safe_ctx": "Safe reasoning context",
|
| 440 |
"niah.section.breakdown": "Architecture breakdown",
|
| 441 |
"niah.section.reco": "Recommendation",
|
| 442 |
+
"niah.calib.heading": "RULER-calibrated (NVIDIA published data)",
|
| 443 |
+
"niah.calib.matched": "Matched <code>{alias}</code> → KB row <code>{canonical}</code>.",
|
| 444 |
+
"niah.calib.aggregate": "RULER aggregate",
|
| 445 |
+
"niah.calib.interp": "interpolated between",
|
| 446 |
+
"niah.calib.extrapolated": "extrapolated outside RULER's measured range",
|
| 447 |
+
"niah.calib.col.heuristic": "Heuristic",
|
| 448 |
+
"niah.calib.col.calibrated": "RULER-calibrated",
|
| 449 |
+
"niah.calib.col.delta": "Δ",
|
| 450 |
+
"niah.calib.factors": "Per-task factors from RULER paper Appendix Tables 13-16:",
|
| 451 |
+
"niah.calib.factors_caveat": "honest range: retrieval 0.95-1.10×, reasoning 0.60-0.85×",
|
| 452 |
+
"niah.calib.claimed_vs_effective": "Paper-reported",
|
| 453 |
+
"niah.calib.claimed": "claimed",
|
| 454 |
+
"niah.calib.effective": "effective",
|
| 455 |
+
"niah.calib.source": "Source",
|
| 456 |
+
"niah.calib.miss": "RULER calibration unavailable for this model — using architectural heuristic only. Add to data/ruler_kb.json if you have measured numbers.",
|
| 457 |
"niah.section.sweep": "Pass rate sweep across context lengths",
|
| 458 |
"niah.field.dhorizon": "d_horizon (effective)",
|
| 459 |
"niah.field.ratio": "T_eval / d_horizon",
|
|
|
|
| 1612 |
"niah.label.safe_ctx": "Contexto seguro de reasoning",
|
| 1613 |
"niah.section.breakdown": "Desglose arquitectónico",
|
| 1614 |
"niah.section.reco": "Recomendación",
|
| 1615 |
+
"niah.calib.heading": "Calibrado con RULER (datos publicados por NVIDIA)",
|
| 1616 |
+
"niah.calib.matched": "Coincide <code>{alias}</code> → fila KB <code>{canonical}</code>.",
|
| 1617 |
+
"niah.calib.aggregate": "Agregado RULER",
|
| 1618 |
+
"niah.calib.interp": "interpolado entre",
|
| 1619 |
+
"niah.calib.extrapolated": "extrapolado fuera del rango medido por RULER",
|
| 1620 |
+
"niah.calib.col.heuristic": "Heurística",
|
| 1621 |
+
"niah.calib.col.calibrated": "Calibrado RULER",
|
| 1622 |
+
"niah.calib.col.delta": "Δ",
|
| 1623 |
+
"niah.calib.factors": "Factores por tarea del paper RULER, Apéndice Tablas 13-16:",
|
| 1624 |
+
"niah.calib.factors_caveat": "rango honesto: retrieval 0.95-1.10×, reasoning 0.60-0.85×",
|
| 1625 |
+
"niah.calib.claimed_vs_effective": "Reportado en paper",
|
| 1626 |
+
"niah.calib.claimed": "claimed",
|
| 1627 |
+
"niah.calib.effective": "effective",
|
| 1628 |
+
"niah.calib.source": "Fuente",
|
| 1629 |
+
"niah.calib.miss": "Calibración RULER no disponible para este modelo — usando solo heurística arquitectónica. Añade a data/ruler_kb.json si tienes números medidos.",
|
| 1630 |
"niah.section.sweep": "Barrido de tasas pass por longitud de contexto",
|
| 1631 |
"niah.field.dhorizon": "d_horizon (efectivo)",
|
| 1632 |
"niah.field.ratio": "T_eval / d_horizon",
|
|
|
|
| 2649 |
"niah.label.safe_ctx": "Contexte sûr pour reasoning",
|
| 2650 |
"niah.section.breakdown": "Détail architectural",
|
| 2651 |
"niah.section.reco": "Recommandation",
|
| 2652 |
+
"niah.calib.heading": "Calibré avec RULER (données publiées par NVIDIA)",
|
| 2653 |
+
"niah.calib.matched": "Correspond <code>{alias}</code> → ligne KB <code>{canonical}</code>.",
|
| 2654 |
+
"niah.calib.aggregate": "Agrégat RULER",
|
| 2655 |
+
"niah.calib.interp": "interpolé entre",
|
| 2656 |
+
"niah.calib.extrapolated": "extrapolé hors de la plage mesurée par RULER",
|
| 2657 |
+
"niah.calib.col.heuristic": "Heuristique",
|
| 2658 |
+
"niah.calib.col.calibrated": "Calibré RULER",
|
| 2659 |
+
"niah.calib.col.delta": "Δ",
|
| 2660 |
+
"niah.calib.factors": "Facteurs par tâche du paper RULER, Appendice Tables 13-16 :",
|
| 2661 |
+
"niah.calib.factors_caveat": "plage honnête : retrieval 0.95-1.10×, reasoning 0.60-0.85×",
|
| 2662 |
+
"niah.calib.claimed_vs_effective": "Rapporté dans le paper",
|
| 2663 |
+
"niah.calib.claimed": "claimed",
|
| 2664 |
+
"niah.calib.effective": "effective",
|
| 2665 |
+
"niah.calib.source": "Source",
|
| 2666 |
+
"niah.calib.miss": "Calibration RULER indisponible pour ce modèle — utilisation de l'heuristique architecturale seule. Ajoutez à data/ruler_kb.json si vous avez des chiffres mesurés.",
|
| 2667 |
"niah.section.sweep": "Balayage des taux par longueur de contexte",
|
| 2668 |
"niah.field.dhorizon": "d_horizon (effectif)",
|
| 2669 |
"niah.field.ratio": "T_eval / d_horizon",
|
|
|
|
| 3686 |
"niah.label.safe_ctx": "Reasoning 安全上下文",
|
| 3687 |
"niah.section.breakdown": "架构细节",
|
| 3688 |
"niah.section.reco": "建议",
|
| 3689 |
+
"niah.calib.heading": "RULER 校准(NVIDIA 已发布数据)",
|
| 3690 |
+
"niah.calib.matched": "匹配 <code>{alias}</code> → KB 行 <code>{canonical}</code>。",
|
| 3691 |
+
"niah.calib.aggregate": "RULER 聚合分",
|
| 3692 |
+
"niah.calib.interp": "在以下之间插值",
|
| 3693 |
+
"niah.calib.extrapolated": "外推到 RULER 已测范围之外",
|
| 3694 |
+
"niah.calib.col.heuristic": "启发式",
|
| 3695 |
+
"niah.calib.col.calibrated": "RULER 校准",
|
| 3696 |
+
"niah.calib.col.delta": "Δ",
|
| 3697 |
+
"niah.calib.factors": "来自 RULER 论文附录表 13-16 的每任务因子:",
|
| 3698 |
+
"niah.calib.factors_caveat": "诚实范围:retrieval 0.95-1.10×,reasoning 0.60-0.85×",
|
| 3699 |
+
"niah.calib.claimed_vs_effective": "论文报告",
|
| 3700 |
+
"niah.calib.claimed": "claimed",
|
| 3701 |
+
"niah.calib.effective": "effective",
|
| 3702 |
+
"niah.calib.source": "来源",
|
| 3703 |
+
"niah.calib.miss": "此模型暂无 RULER 校准——仅使用架构启发式。如有实测数字,请添加到 data/ruler_kb.json。",
|
| 3704 |
"niah.section.sweep": "按上下文长度扫描通过率",
|
| 3705 |
"niah.field.dhorizon": "d_horizon(有效)",
|
| 3706 |
"niah.field.ratio": "T_eval / d_horizon",
|
|
@@ -18,7 +18,7 @@ import { rateAllBenchmarks, BENCHMARK_DB } from "./contamination_prior.js";
|
|
| 18 |
import { predictQuantShift, predictAllSchemes, QUANT_SCHEMES } from "./quant_regime.js";
|
| 19 |
import { attachAllHfAutocompletes } from "./hf_autocomplete.js";
|
| 20 |
import { computeDriftBound, FRAMEWORKS as DRIFT_FRAMEWORKS, DTYPES as DRIFT_DTYPES } from "./cross_drift.js";
|
| 21 |
-
import { predictNIAHReasoning, sweepContextLengths } from "./niah_reasoning.js";
|
| 22 |
import {
|
| 23 |
loadSaturationKB, classifyAll, classifyBenchmark,
|
| 24 |
listBenchmarks, attribution as saturationAttribution, tryFetchLive,
|
|
@@ -1419,7 +1419,7 @@ async function niahFetchConfig() {
|
|
| 1419 |
}
|
| 1420 |
}
|
| 1421 |
|
| 1422 |
-
function renderNIAHCard(result, modelId) {
|
| 1423 |
const escapeHtml = (s) => String(s).replace(/[&<>"']/g, c =>
|
| 1424 |
({"&":"&","<":"<",">":">",'"':""","'":"'"}[c]));
|
| 1425 |
const fmtN = (x) => x === null || x === undefined ? "—" : Number(x).toLocaleString();
|
|
@@ -1430,6 +1430,80 @@ function renderNIAHCard(result, modelId) {
|
|
| 1430 |
? tFmt("niah.safe_context", { ctx: result.safe_context })
|
| 1431 |
: (t("niah.safe_context_none") || "No safe context found below your target — model fails reasoning even at small contexts.");
|
| 1432 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1433 |
return `
|
| 1434 |
<div class="unmask-result">
|
| 1435 |
<div class="unmask-hero" style="border-color: ${color};">
|
|
@@ -1442,6 +1516,7 @@ function renderNIAHCard(result, modelId) {
|
|
| 1442 |
</div>
|
| 1443 |
</div>
|
| 1444 |
<div class="unmask-details">
|
|
|
|
| 1445 |
<details class="unmask-panel" open>
|
| 1446 |
<summary class="unmask-panel-title">${t("niah.section.breakdown") || "Architecture breakdown"}</summary>
|
| 1447 |
<ul>
|
|
@@ -1512,7 +1587,11 @@ async function runNIAHPredict() {
|
|
| 1512 |
return;
|
| 1513 |
}
|
| 1514 |
const result = predictNIAHReasoning(cfg, T_eval);
|
| 1515 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1516 |
$("niah-status").textContent = tFmt("niah.status.done", {
|
| 1517 |
verdict: t(`niah.verdict.${result.verdict}`) || result.verdict,
|
| 1518 |
niah: (result.niah_rate * 100).toFixed(0),
|
|
|
|
| 18 |
import { predictQuantShift, predictAllSchemes, QUANT_SCHEMES } from "./quant_regime.js";
|
| 19 |
import { attachAllHfAutocompletes } from "./hf_autocomplete.js";
|
| 20 |
import { computeDriftBound, FRAMEWORKS as DRIFT_FRAMEWORKS, DTYPES as DRIFT_DTYPES } from "./cross_drift.js";
|
| 21 |
+
import { predictNIAHReasoning, sweepContextLengths, loadRulerKB, calibrateNIAH, listRulerModels } from "./niah_reasoning.js";
|
| 22 |
import {
|
| 23 |
loadSaturationKB, classifyAll, classifyBenchmark,
|
| 24 |
listBenchmarks, attribution as saturationAttribution, tryFetchLive,
|
|
|
|
| 1419 |
}
|
| 1420 |
}
|
| 1421 |
|
| 1422 |
+
function renderNIAHCard(result, modelId, calib = null) {
|
| 1423 |
const escapeHtml = (s) => String(s).replace(/[&<>"']/g, c =>
|
| 1424 |
({"&":"&","<":"<",">":">",'"':""","'":"'"}[c]));
|
| 1425 |
const fmtN = (x) => x === null || x === undefined ? "—" : Number(x).toLocaleString();
|
|
|
|
| 1430 |
? tFmt("niah.safe_context", { ctx: result.safe_context })
|
| 1431 |
: (t("niah.safe_context_none") || "No safe context found below your target — model fails reasoning even at small contexts.");
|
| 1432 |
|
| 1433 |
+
// RULER calibration block — appears only when KB lookup hits.
|
| 1434 |
+
// Shows measured RULER aggregate, derived NIAH/reasoning, and the
|
| 1435 |
+
// delta vs the heuristic so users see when the predictor was off.
|
| 1436 |
+
let calibBlock = "";
|
| 1437 |
+
if (calib) {
|
| 1438 |
+
const fmtPct = (v) => `${(v * 100).toFixed(0)}%`;
|
| 1439 |
+
const fmtDelta = (d) => {
|
| 1440 |
+
if (d == null) return "—";
|
| 1441 |
+
const pp = Math.round(d * 100);
|
| 1442 |
+
const sign = pp > 0 ? "+" : "";
|
| 1443 |
+
const col = Math.abs(pp) >= 10 ? "#f0883e" : Math.abs(pp) >= 5 ? "#d29922" : "#8b949e";
|
| 1444 |
+
return `<span style="color:${col};">${sign}${pp} pp</span>`;
|
| 1445 |
+
};
|
| 1446 |
+
const extrapNote = calib.extrapolated
|
| 1447 |
+
? `<span class="subtle" style="color:#d29922;font-size:0.85em;"> ⚠ ${t("niah.calib.extrapolated") || "extrapolated outside RULER's measured range"}</span>`
|
| 1448 |
+
: "";
|
| 1449 |
+
calibBlock = `
|
| 1450 |
+
<details class="unmask-panel" open style="border-left:3px solid #3fb950;">
|
| 1451 |
+
<summary class="unmask-panel-title">📊 ${t("niah.calib.heading") || "RULER-calibrated (NVIDIA published data)"}</summary>
|
| 1452 |
+
<p>${tFmt("niah.calib.matched", {
|
| 1453 |
+
alias: escapeHtml(calib.matched_alias),
|
| 1454 |
+
canonical: escapeHtml(calib.canonical_id),
|
| 1455 |
+
}) || `Matched <code>${escapeHtml(calib.matched_alias)}</code> → KB row <code>${escapeHtml(calib.canonical_id)}</code>.`}</p>
|
| 1456 |
+
<p>
|
| 1457 |
+
<strong>${t("niah.calib.aggregate") || "RULER aggregate"} @ ${fmtN(result.T_eval)}:</strong>
|
| 1458 |
+
<code>${calib.ruler_avg_pct}%</code>
|
| 1459 |
+
<span class="subtle">(${t("niah.calib.interp") || "interpolated between"} ${calib.interp_anchor})</span>${extrapNote}
|
| 1460 |
+
</p>
|
| 1461 |
+
<table class="arena-table" style="margin-top:0.5em;">
|
| 1462 |
+
<thead><tr>
|
| 1463 |
+
<th></th>
|
| 1464 |
+
<th>${t("niah.calib.col.heuristic") || "Heuristic"}</th>
|
| 1465 |
+
<th>${t("niah.calib.col.calibrated") || "RULER-calibrated"}</th>
|
| 1466 |
+
<th>${t("niah.calib.col.delta") || "Δ"}</th>
|
| 1467 |
+
</tr></thead>
|
| 1468 |
+
<tbody>
|
| 1469 |
+
<tr>
|
| 1470 |
+
<td><strong>NIAH</strong></td>
|
| 1471 |
+
<td>${fmtPct(result.niah_rate)}</td>
|
| 1472 |
+
<td><strong>${fmtPct(calib.niah_calibrated)}</strong></td>
|
| 1473 |
+
<td>${fmtDelta(calib.delta_niah)}</td>
|
| 1474 |
+
</tr>
|
| 1475 |
+
<tr>
|
| 1476 |
+
<td><strong>${t("niah.label.reasoning") || "Reasoning"}</strong></td>
|
| 1477 |
+
<td>${fmtPct(result.reasoning_rate)}</td>
|
| 1478 |
+
<td><strong>${fmtPct(calib.reasoning_calibrated)}</strong></td>
|
| 1479 |
+
<td>${fmtDelta(calib.delta_reasoning)}</td>
|
| 1480 |
+
</tr>
|
| 1481 |
+
</tbody>
|
| 1482 |
+
</table>
|
| 1483 |
+
<p class="recipe-desc subtle" style="font-size:0.82em;">
|
| 1484 |
+
${t("niah.calib.factors") || "Per-task factors from RULER paper Appendix Tables 13-16:"}
|
| 1485 |
+
retrieval = ${calib.retrieval_factor}× aggregate,
|
| 1486 |
+
reasoning = ${calib.reasoning_factor}× aggregate
|
| 1487 |
+
(${t("niah.calib.factors_caveat") || "honest range: retrieval 0.95-1.10×, reasoning 0.60-0.85×"}).
|
| 1488 |
+
</p>
|
| 1489 |
+
<p class="recipe-desc subtle" style="font-size:0.82em;">
|
| 1490 |
+
${t("niah.calib.claimed_vs_effective") || "Paper-reported"}:
|
| 1491 |
+
${t("niah.calib.claimed") || "claimed"} ${fmtN(calib.claimed_context)} /
|
| 1492 |
+
${t("niah.calib.effective") || "effective"} ${fmtN(calib.effective_context)}.
|
| 1493 |
+
${t("niah.calib.source") || "Source"}:
|
| 1494 |
+
<a href="${calib.source_url}" target="_blank" rel="noopener noreferrer">RULER paper (Hsieh et al., COLM 2024)</a>
|
| 1495 |
+
</p>
|
| 1496 |
+
</details>
|
| 1497 |
+
`;
|
| 1498 |
+
} else if (modelId) {
|
| 1499 |
+
// KB miss — explicitly state we're heuristic-only.
|
| 1500 |
+
calibBlock = `
|
| 1501 |
+
<p class="recipe-desc subtle" style="font-size:0.85em;margin-top:0.5em;">
|
| 1502 |
+
💡 ${t("niah.calib.miss") || "RULER calibration unavailable for this model — using architectural heuristic only. Add to data/ruler_kb.json if you have measured numbers."}
|
| 1503 |
+
</p>
|
| 1504 |
+
`;
|
| 1505 |
+
}
|
| 1506 |
+
|
| 1507 |
return `
|
| 1508 |
<div class="unmask-result">
|
| 1509 |
<div class="unmask-hero" style="border-color: ${color};">
|
|
|
|
| 1516 |
</div>
|
| 1517 |
</div>
|
| 1518 |
<div class="unmask-details">
|
| 1519 |
+
${calibBlock}
|
| 1520 |
<details class="unmask-panel" open>
|
| 1521 |
<summary class="unmask-panel-title">${t("niah.section.breakdown") || "Architecture breakdown"}</summary>
|
| 1522 |
<ul>
|
|
|
|
| 1587 |
return;
|
| 1588 |
}
|
| 1589 |
const result = predictNIAHReasoning(cfg, T_eval);
|
| 1590 |
+
// Ensure RULER KB is loaded once; idempotent. No-op if already loaded.
|
| 1591 |
+
await loadRulerKB();
|
| 1592 |
+
// Calibrate against published RULER measurements if available.
|
| 1593 |
+
const calib = calibrateNIAH(__niahLastModelId, T_eval, result);
|
| 1594 |
+
$("niah-output").innerHTML = renderNIAHCard(result, __niahLastModelId, calib);
|
| 1595 |
$("niah-status").textContent = tFmt("niah.status.done", {
|
| 1596 |
verdict: t(`niah.verdict.${result.verdict}`) || result.verdict,
|
| 1597 |
niah: (result.niah_rate * 100).toFixed(0),
|
|
@@ -141,3 +141,148 @@ export function sweepContextLengths(config, lengths = null) {
|
|
| 141 |
);
|
| 142 |
return defaults.map(T => predictNIAHReasoning(config, T));
|
| 143 |
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 141 |
);
|
| 142 |
return defaults.map(T => predictNIAHReasoning(config, T));
|
| 143 |
}
|
| 144 |
+
|
| 145 |
+
|
| 146 |
+
// =============================================================================
|
| 147 |
+
// RULER calibration (v0.8.6 anti-bullshit pack #12)
|
| 148 |
+
// =============================================================================
|
| 149 |
+
//
|
| 150 |
+
// The heuristic predictor above is a Padé-canonical extrapolation from
|
| 151 |
+
// architectural inputs. It's calibrated against ROUGH RULER bands, but
|
| 152 |
+
// for any specific (model, context) pair where NVIDIA published a
|
| 153 |
+
// measurement, the published number is GROUND TRUTH. This block layers
|
| 154 |
+
// calibration on top: when the user's model id matches a row in
|
| 155 |
+
// data/ruler_kb.json, we interpolate the published RULER aggregate at
|
| 156 |
+
// the requested T_eval and back out per-task estimates via the paper's
|
| 157 |
+
// retrieval-vs-reasoning factor band.
|
| 158 |
+
//
|
| 159 |
+
// Anti-bullshit principle: if measured data exists, USE the measured
|
| 160 |
+
// data, don't ship a heuristic guess that contradicts it. Surface the
|
| 161 |
+
// heuristic-vs-calibrated delta so users see when our predictor was
|
| 162 |
+
// over- or under-confident vs the published ground truth.
|
| 163 |
+
|
| 164 |
+
let _rulerKb = null;
|
| 165 |
+
|
| 166 |
+
export async function loadRulerKB(url = "./data/ruler_kb.json") {
|
| 167 |
+
if (_rulerKb) return _rulerKb;
|
| 168 |
+
try {
|
| 169 |
+
const res = await fetch(url);
|
| 170 |
+
if (!res.ok) throw new Error(`RULER KB fetch failed: ${res.status}`);
|
| 171 |
+
_rulerKb = await res.json();
|
| 172 |
+
// Build alias→canonical reverse index for fast lookup. Lowercase
|
| 173 |
+
// for case-insensitive matching of user-pasted ids.
|
| 174 |
+
_rulerKb._aliasIndex = {};
|
| 175 |
+
for (const [canon, m] of Object.entries(_rulerKb.models)) {
|
| 176 |
+
_rulerKb._aliasIndex[canon.toLowerCase()] = canon;
|
| 177 |
+
for (const a of m.id_aliases || []) {
|
| 178 |
+
_rulerKb._aliasIndex[a.toLowerCase()] = canon;
|
| 179 |
+
}
|
| 180 |
+
}
|
| 181 |
+
return _rulerKb;
|
| 182 |
+
} catch (e) {
|
| 183 |
+
return null;
|
| 184 |
+
}
|
| 185 |
+
}
|
| 186 |
+
|
| 187 |
+
export function getRulerKB() { return _rulerKb; }
|
| 188 |
+
|
| 189 |
+
// Lookup a model in the KB. Tolerates: bare canonical key, any listed
|
| 190 |
+
// alias, or HF "{org}/{name}" form. Returns the model entry or null.
|
| 191 |
+
export function lookupRulerModel(modelId) {
|
| 192 |
+
if (!_rulerKb || !modelId) return null;
|
| 193 |
+
const k = String(modelId).trim().toLowerCase();
|
| 194 |
+
const canon = _rulerKb._aliasIndex[k];
|
| 195 |
+
if (canon) return { canonical: canon, ..._rulerKb.models[canon] };
|
| 196 |
+
// Try the post-`/` segment too (e.g. "meta-llama/Llama-3.1-70B-Instruct"
|
| 197 |
+
// → "Llama-3.1-70B-Instruct")
|
| 198 |
+
const tail = k.includes("/") ? k.split("/").pop() : null;
|
| 199 |
+
if (tail) {
|
| 200 |
+
const c2 = _rulerKb._aliasIndex[tail];
|
| 201 |
+
if (c2) return { canonical: c2, ..._rulerKb.models[c2] };
|
| 202 |
+
}
|
| 203 |
+
return null;
|
| 204 |
+
}
|
| 205 |
+
|
| 206 |
+
// Linear-interpolate RULER aggregate score between bracketing context
|
| 207 |
+
// samples. Returns null when T_eval is outside the bracketed range
|
| 208 |
+
// (we extrapolate cautiously: clamp at the nearest endpoint).
|
| 209 |
+
function interpolateRulerAvg(rulerEntry, T_eval) {
|
| 210 |
+
const levels = [4096, 8192, 16384, 32768, 65536, 131072];
|
| 211 |
+
const keys = ["4k", "8k", "16k", "32k", "64k", "128k"];
|
| 212 |
+
const vals = keys.map(k => rulerEntry.ruler_avg[k]).filter(v => typeof v === "number");
|
| 213 |
+
if (vals.length === 0) return null;
|
| 214 |
+
// Below smallest sample → clamp at first
|
| 215 |
+
if (T_eval <= levels[0]) {
|
| 216 |
+
return { value: rulerEntry.ruler_avg[keys[0]], extrapolated: T_eval < levels[0], anchor: keys[0] };
|
| 217 |
+
}
|
| 218 |
+
// Above largest sample → clamp at last (extrapolation flag set)
|
| 219 |
+
if (T_eval >= levels[levels.length - 1]) {
|
| 220 |
+
return { value: rulerEntry.ruler_avg[keys[keys.length - 1]], extrapolated: T_eval > levels[levels.length - 1], anchor: keys[keys.length - 1] };
|
| 221 |
+
}
|
| 222 |
+
// Find bracketing pair
|
| 223 |
+
for (let i = 0; i < levels.length - 1; i++) {
|
| 224 |
+
if (T_eval >= levels[i] && T_eval <= levels[i + 1]) {
|
| 225 |
+
const a = rulerEntry.ruler_avg[keys[i]];
|
| 226 |
+
const b = rulerEntry.ruler_avg[keys[i + 1]];
|
| 227 |
+
// Linear in log-context (RULER scores degrade roughly linearly
|
| 228 |
+
// in log T near the effective-length boundary)
|
| 229 |
+
const t = (Math.log2(T_eval) - Math.log2(levels[i])) /
|
| 230 |
+
(Math.log2(levels[i + 1]) - Math.log2(levels[i]));
|
| 231 |
+
return { value: a + (b - a) * t, extrapolated: false, anchor: `${keys[i]}↔${keys[i + 1]}` };
|
| 232 |
+
}
|
| 233 |
+
}
|
| 234 |
+
return null;
|
| 235 |
+
}
|
| 236 |
+
|
| 237 |
+
// Calibrate a heuristic prediction against the published RULER
|
| 238 |
+
// aggregate. Returns null if the model isn't in the KB. Returns a
|
| 239 |
+
// calibration object otherwise: measured aggregate, derived NIAH and
|
| 240 |
+
// reasoning rates, and the delta vs heuristic.
|
| 241 |
+
export function calibrateNIAH(modelId, T_eval, heuristicResult) {
|
| 242 |
+
const entry = lookupRulerModel(modelId);
|
| 243 |
+
if (!entry || !_rulerKb) return null;
|
| 244 |
+
|
| 245 |
+
const interp = interpolateRulerAvg(entry, T_eval);
|
| 246 |
+
if (!interp) return null;
|
| 247 |
+
|
| 248 |
+
const aggregate = interp.value; // 0-100 scale per RULER convention
|
| 249 |
+
const priors = _rulerKb.task_breakdown_priors || {
|
| 250 |
+
retrieval_factor: 1.04,
|
| 251 |
+
reasoning_factor: 0.78,
|
| 252 |
+
};
|
| 253 |
+
const niahCalibrated = Math.min(1.0, (aggregate * priors.retrieval_factor) / 100);
|
| 254 |
+
const reasoningCalibrated = Math.min(1.0, (aggregate * priors.reasoning_factor) / 100);
|
| 255 |
+
|
| 256 |
+
return {
|
| 257 |
+
canonical_id: entry.canonical,
|
| 258 |
+
matched_alias: modelId,
|
| 259 |
+
ruler_avg_pct: Math.round(aggregate * 10) / 10,
|
| 260 |
+
interp_anchor: interp.anchor,
|
| 261 |
+
extrapolated: interp.extrapolated,
|
| 262 |
+
claimed_context: entry.claimed_context,
|
| 263 |
+
effective_context: entry.effective_context,
|
| 264 |
+
niah_calibrated: Math.round(niahCalibrated * 100) / 100,
|
| 265 |
+
reasoning_calibrated: Math.round(reasoningCalibrated * 100) / 100,
|
| 266 |
+
delta_niah: heuristicResult
|
| 267 |
+
? Math.round((niahCalibrated - heuristicResult.niah_rate) * 100) / 100
|
| 268 |
+
: null,
|
| 269 |
+
delta_reasoning: heuristicResult
|
| 270 |
+
? Math.round((reasoningCalibrated - heuristicResult.reasoning_rate) * 100) / 100
|
| 271 |
+
: null,
|
| 272 |
+
retrieval_factor: priors.retrieval_factor,
|
| 273 |
+
reasoning_factor: priors.reasoning_factor,
|
| 274 |
+
source_url: _rulerKb.source?.primary || "",
|
| 275 |
+
};
|
| 276 |
+
}
|
| 277 |
+
|
| 278 |
+
// List all models in the KB (for UI dropdown / "did you mean" hint).
|
| 279 |
+
export function listRulerModels() {
|
| 280 |
+
if (!_rulerKb) return [];
|
| 281 |
+
return Object.entries(_rulerKb.models).map(([k, v]) => ({
|
| 282 |
+
canonical: k,
|
| 283 |
+
aliases: v.id_aliases || [],
|
| 284 |
+
claimed_context: v.claimed_context,
|
| 285 |
+
effective_context: v.effective_context,
|
| 286 |
+
category: v.category,
|
| 287 |
+
}));
|
| 288 |
+
}
|