Spaces:
Running
Running
Commit ·
cb542c8
1
Parent(s): 77f164d
docs(v0.5.3): README changelog + HF Post communicating audit-driven fixes
Browse files- README.md: new "What's new in v0.5.3" section listing all fixes,
calibration audit table (panel n=22), and paper §5.2 erratum reference.
- docs/hf-post-v053-fix.md: public-facing post for HF Space community,
with model-by-model impact table, what was fixed / not affected,
and verification instructions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- README.md +72 -0
- docs/hf-post-v053-fix.md +160 -0
README.md
CHANGED
|
@@ -211,6 +211,78 @@ detect anomalous checkpoints.
|
|
| 211 |
|
| 212 |
---
|
| 213 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 214 |
## What's new in v0.5 (2026-05-01) — 🔬 Machine-verified consistency
|
| 215 |
|
| 216 |
**First transformer-attention framework with formal machine-proof backing.**
|
|
|
|
| 211 |
|
| 212 |
---
|
| 213 |
|
| 214 |
+
## What's new in v0.5.3 (2026-05-02) — 🔧 Audit-driven bug fixes
|
| 215 |
+
|
| 216 |
+
The TAF Agent was **applied to its own author's paper** (recursive Sócrates audit)
|
| 217 |
+
and to the agent's own formula implementations. Several real bugs were detected
|
| 218 |
+
and corrected. **All v0.5.0–v0.5.2 users running diagnostics on Phase B models
|
| 219 |
+
(γ > 1: LLaMA-2/3, Mistral, Gemma, Qwen2.5-7B near-Hagedorn) received
|
| 220 |
+
incorrect KV-compression recommendations.** This release fixes all known issues.
|
| 221 |
+
|
| 222 |
+
### Critical fixes
|
| 223 |
+
|
| 224 |
+
- **`D_f_closed` (KV compression window)**: replaced asymptotic / Hagedorn-buffer
|
| 225 |
+
branches with **discrete cumulative sum**. Old code clamped Phase B (γ>1) to
|
| 226 |
+
N when truth was ~3 % of N (LLaMA-3-8B at γ=1.046 with N=2000 should compress
|
| 227 |
+
to ~750 tokens; old code returned 2000). Boundary γ ∈ [0.99, 1.01] was off
|
| 228 |
+
by factor ~2×. Now exact for any γ.
|
| 229 |
+
|
| 230 |
+
- **`partition_Z(γ=1, N)`**: was `log(N + 0.5)`, missing Euler-Mascheroni
|
| 231 |
+
constant γ_E ≈ 0.577 (~7 % underestimate of H_N). Now `log(N) + γ_E`.
|
| 232 |
+
|
| 233 |
+
- **`free_energy_F`**: returned `−log(Z)` (β·F convention). Now `−log(Z)/γ`,
|
| 234 |
+
consistent with the Helmholtz definition F = −T·log(Z) and the
|
| 235 |
+
thermodynamic identity S = γ·(U − F).
|
| 236 |
+
|
| 237 |
+
- **`γ_pred`**: replaced obsolete `C/lnθ` heuristic with `γ_Padé(θ, T_eval)`
|
| 238 |
+
(paper §3.3).
|
| 239 |
+
|
| 240 |
+
### Calibration audit (cross-panel re-check, n=22)
|
| 241 |
+
|
| 242 |
+
Re-running the empirical δ corrections of `gamma_decompose` against the
|
| 243 |
+
panel revealed:
|
| 244 |
+
|
| 245 |
+
| Constant | Hardcoded | Panel re-audit | Verdict |
|
| 246 |
+
|---|---|---|---|
|
| 247 |
+
| δ_GQA | +0.11 | +0.115 | ✓ replicates |
|
| 248 |
+
| δ_SWA | −0.21 | originally fit on **n=1 model** | ✗ disabled (insufficient data) |
|
| 249 |
+
| δ_post_IH | −0.15 | group-mean ≈ 0 (n=16 yes / 6 no) | ⚠ flagged exploratory |
|
| 250 |
+
| δ_instruct (v2) | −0.10 | n=3, p=0.06 (already noted) | ⚠ flagged exploratory |
|
| 251 |
+
|
| 252 |
+
`gamma_decompose` and `gamma_decompose_v2` now return per-axis status fields
|
| 253 |
+
(`delta_SWA_status`, `delta_post_IH_status`, etc.) and a top-level
|
| 254 |
+
`calibration_warning` so consumers can detect which corrections are reliable.
|
| 255 |
+
|
| 256 |
+
The TAF Card UI now displays a collapsible **"v0.5.3 — Calibration audit"
|
| 257 |
+
banner** in all four supported languages (EN/ES/FR/ZH) explaining this.
|
| 258 |
+
|
| 259 |
+
### Paper §5.2 erratum
|
| 260 |
+
|
| 261 |
+
The framework's **own self-audit** found that paper §5.2 Theorem 5.2 claims
|
| 262 |
+
`C_V(γ=1, N) = (log N)²/4`. Sócrates triangulation (numerical Python +
|
| 263 |
+
Sage exact rational + SymPy symbolic integral) confirms the correct
|
| 264 |
+
asymptotic is `(log N)²/12` — a factor-3 error in the paper's truncated
|
| 265 |
+
Z-expansion proof. The agent's `heat_capacity_Cv` already computes the
|
| 266 |
+
correct value via numerical derivative of U; **only the paper's analytic
|
| 267 |
+
formula is wrong, not the tool**. A formal erratum will be published as a
|
| 268 |
+
separate document.
|
| 269 |
+
|
| 270 |
+
### Tests
|
| 271 |
+
|
| 272 |
+
22/22 unit tests pass (`tests/test_taf_formulas.py`), including regression
|
| 273 |
+
tests for D_f Phase B, partition_Z γ_E, free_energy_F convention, and
|
| 274 |
+
δ_SWA disabled.
|
| 275 |
+
|
| 276 |
+
### Why this happened
|
| 277 |
+
|
| 278 |
+
These bugs survived prior reviews because the affected code paths were
|
| 279 |
+
exercised mainly on Phase A models (γ < 0.95) where the asymptotic
|
| 280 |
+
approximation is close enough. Phase B (γ > 1) and the boundary near
|
| 281 |
+
Hagedorn (|γ−1| < 0.05) were under-tested. The agent now uses direct
|
| 282 |
+
discrete computation, so accuracy is uniform across all γ.
|
| 283 |
+
|
| 284 |
+
---
|
| 285 |
+
|
| 286 |
## What's new in v0.5 (2026-05-01) — 🔬 Machine-verified consistency
|
| 287 |
|
| 288 |
**First transformer-attention framework with formal machine-proof backing.**
|
docs/hf-post-v053-fix.md
ADDED
|
@@ -0,0 +1,160 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# 🔧 TAF Agent v0.5.3 — Audit-driven bug fixes
|
| 2 |
+
|
| 3 |
+
**TL;DR** — If you ran the TAF Agent on a model with γ > 1 (LLaMA-2/3,
|
| 4 |
+
Mistral, Gemma, near-Hagedorn Qwen) before today, the KV-compression
|
| 5 |
+
recommendation (`D_f`) was probably wrong. The agent has been corrected
|
| 6 |
+
end-to-end. Re-run your diagnostics.
|
| 7 |
+
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
## What was wrong
|
| 11 |
+
|
| 12 |
+
The agent applies a self-audit to its own paper. Yesterday I turned the
|
| 13 |
+
audit on the agent itself. It found six issues. Three are critical
|
| 14 |
+
enough that I'm posting publicly.
|
| 15 |
+
|
| 16 |
+
### 1. `D_f_closed` Phase B (γ>1) — wrong by 30–95 %
|
| 17 |
+
|
| 18 |
+
For models with γ > 1 (Phase B, where attention is locally concentrated)
|
| 19 |
+
the old asymptotic formula clamped to the full context length N, when
|
| 20 |
+
the true compression target is ~3–10 % of N.
|
| 21 |
+
|
| 22 |
+
| Model | γ | Old `D_f` (N=2000, f=0.9) | Correct `D_f` |
|
| 23 |
+
|---|---|---|---|
|
| 24 |
+
| LLaMA-2-7B | 1.026 | 2000 (clamped) | ~830 |
|
| 25 |
+
| LLaMA-3-8B | 1.046 | 2000 (clamped) | ~750 |
|
| 26 |
+
| Gemma-2-9B random | 1.135 | 2000 (clamped) | ~610 |
|
| 27 |
+
| (γ = 1.5 stress test) | 1.500 | 2000 (clamped) | ~44 |
|
| 28 |
+
|
| 29 |
+
If you used the agent's compression suggestion for any of these, you
|
| 30 |
+
were leaving real memory savings on the table.
|
| 31 |
+
|
| 32 |
+
### 2. Hagedorn buffer (|γ − 1| < 0.01) — factor 2× off
|
| 33 |
+
|
| 34 |
+
Models living right at the phase boundary (Qwen2.5-7B at γ=0.997, etc.)
|
| 35 |
+
hit a hardcoded special case `N · f^(1/log N)` instead of the correct
|
| 36 |
+
`N^f`. Off by ~2×.
|
| 37 |
+
|
| 38 |
+
### 3. `δ_SWA = −0.21` calibration — fit on n = 1 model
|
| 39 |
+
|
| 40 |
+
The architectural decomposition `gamma_decompose` carried a SWA
|
| 41 |
+
correction of −0.21 derived from a single Sliding-Window-Attention
|
| 42 |
+
model in the panel. With n = 1 you cannot estimate a coefficient; the
|
| 43 |
+
constant was effectively arbitrary. **Now disabled** with explicit
|
| 44 |
+
status flag `delta_SWA_status: 'exploratory_n1_disabled'`.
|
| 45 |
+
|
| 46 |
+
`δ_post_IH = −0.15` and `δ_instruct = −0.10` did not replicate cleanly
|
| 47 |
+
on the panel re-audit either; both now carry `exploratory` flags.
|
| 48 |
+
Only `δ_GQA = +0.11` (panel-mean +0.115) replicates. **The most
|
| 49 |
+
reliable axes are now `δ_GQA` and the `ν_imprint` slope.**
|
| 50 |
+
|
| 51 |
+
---
|
| 52 |
+
|
| 53 |
+
## What was fixed
|
| 54 |
+
|
| 55 |
+
- `D_f_closed` rewritten to use **direct discrete cumulative sum** —
|
| 56 |
+
exact for any γ, no asymptotics, no buffers. ~10 ms per call for
|
| 57 |
+
N ≤ 10⁶.
|
| 58 |
+
|
| 59 |
+
- `partition_Z(γ=1, N)` now adds the Euler-Mascheroni constant
|
| 60 |
+
(~7 % accuracy fix on H_N).
|
| 61 |
+
|
| 62 |
+
- `free_energy_F` switched to physics convention `F = −log(Z)/γ`,
|
| 63 |
+
consistent with `S = γ·(U − F)`.
|
| 64 |
+
|
| 65 |
+
- `γ_pred` now uses `γ_Padé(θ, T_eval)` instead of the obsolete
|
| 66 |
+
`C/lnθ` heuristic.
|
| 67 |
+
|
| 68 |
+
- `gamma_decompose` and `gamma_decompose_v2` return per-axis
|
| 69 |
+
reliability flags + a top-level `calibration_warning`.
|
| 70 |
+
|
| 71 |
+
- TAF Card UI shows a **collapsible "v0.5.3 — Calibration audit"
|
| 72 |
+
banner in all four supported languages** (EN / ES / FR / ZH).
|
| 73 |
+
|
| 74 |
+
- 22 unit tests added (`tests/test_taf_formulas.py`), all passing.
|
| 75 |
+
|
| 76 |
+
---
|
| 77 |
+
|
| 78 |
+
## What was *not* affected
|
| 79 |
+
|
| 80 |
+
These formulas were verified independently and remain correct:
|
| 81 |
+
|
| 82 |
+
- `gamma_pade`, `theta_design`, `alpha_opt`, `theta_eff_pade`
|
| 83 |
+
- `mean_log_d`, `entropy_S` (the new `F` convention adjusts but the
|
| 84 |
+
identity `S = γ·(U − F)` is preserved)
|
| 85 |
+
- `heat_capacity_Cv` — numerical derivative of `mean_log_d`,
|
| 86 |
+
computes the correct value automatically (the **paper §5.2 analytic
|
| 87 |
+
formula `(log N)²/4` is wrong** but the agent never used it; agent
|
| 88 |
+
computes via finite difference and gets the correct asymptotic
|
| 89 |
+
`(log N)²/12`)
|
| 90 |
+
- `d_horizon`, `L_NIAH^c`, `χ`, `T_attn`
|
| 91 |
+
- `gamma_random_predict`, `compute_invariant_K`, `ih_phase_check`
|
| 92 |
+
- All the verified algebraic identities (D-SAGE-1 through 7)
|
| 93 |
+
|
| 94 |
+
---
|
| 95 |
+
|
| 96 |
+
## Paper §5.2 erratum (separate)
|
| 97 |
+
|
| 98 |
+
While auditing the agent, the framework also caught an algebraic error
|
| 99 |
+
in the companion paper. Paper §5.2 Theorem 5.2 claims:
|
| 100 |
+
|
| 101 |
+
```
|
| 102 |
+
C_V(γ = 1, N) = (log N)² / 4
|
| 103 |
+
```
|
| 104 |
+
|
| 105 |
+
Triple triangulation (Sócrates numerical + Sage exact rational + SymPy
|
| 106 |
+
symbolic integration) shows the correct asymptotic is:
|
| 107 |
+
|
| 108 |
+
```
|
| 109 |
+
C_V(γ = 1, N) → (log N)² / 12 (large N)
|
| 110 |
+
```
|
| 111 |
+
|
| 112 |
+
The proof in the paper truncated Z(γ, N) at first order in (1−γ),
|
| 113 |
+
missing a (1−γ)²·(log N)²/6 term. A formal erratum is in preparation
|
| 114 |
+
and will be published as a separate document.
|
| 115 |
+
|
| 116 |
+
This does not affect any of the agent's numerical outputs — the agent
|
| 117 |
+
computes `C_V` via numerical derivative, not the buggy analytic form.
|
| 118 |
+
It only affects the analytic claim in the paper.
|
| 119 |
+
|
| 120 |
+
---
|
| 121 |
+
|
| 122 |
+
## How to verify
|
| 123 |
+
|
| 124 |
+
```bash
|
| 125 |
+
git clone https://github.com/karlesmarin/tafagent
|
| 126 |
+
cd tafagent
|
| 127 |
+
pytest tests/test_taf_formulas.py # 22/22 should pass
|
| 128 |
+
```
|
| 129 |
+
|
| 130 |
+
Or just open the live Space — the calibration banner will show up
|
| 131 |
+
immediately at the top of any TAF Card output.
|
| 132 |
+
|
| 133 |
+
---
|
| 134 |
+
|
| 135 |
+
## Why I'm telling you this
|
| 136 |
+
|
| 137 |
+
If you used a tool's recommendation to change a real production setup
|
| 138 |
+
(KV cache size, RoPE scaling, model selection) and the tool was wrong,
|
| 139 |
+
you deserve to know. That's the point of "auditable, deterministic,
|
| 140 |
+
in-browser" — not just that it's transparent in the abstract, but that
|
| 141 |
+
when a bug is found it gets reported. Today there's a bug to report.
|
| 142 |
+
|
| 143 |
+
The audit framework that found these is itself in early development
|
| 144 |
+
(Sócrates v0.1, internal use). The fact that it caught real issues in
|
| 145 |
+
its own author's published paper and shipped tool is, honestly, the
|
| 146 |
+
strongest validation it has so far.
|
| 147 |
+
|
| 148 |
+
If you spot anything else wrong — please open an issue.
|
| 149 |
+
|
| 150 |
+
— Carles Marín
|
| 151 |
+
*Independent researcher*
|
| 152 |
+
*2026-05-02*
|
| 153 |
+
|
| 154 |
+
---
|
| 155 |
+
|
| 156 |
+
**Links**:
|
| 157 |
+
- Live: https://huggingface.co/spaces/karlexmarin/taf-agent
|
| 158 |
+
- Source: https://github.com/karlesmarin/tafagent
|
| 159 |
+
- Paper: https://zenodo.org/records/19826343
|
| 160 |
+
- Dataset: https://huggingface.co/datasets/karlexmarin/taf-attention-decay
|