Spaces:

karlexmarin
/

taf-agent

Running

karlexmarin Claude Opus 4.7 (1M context) commited on 10 days ago

Commit

cb542c8

1 Parent(s): 77f164d

docs(v0.5.3): README changelog + HF Post communicating audit-driven fixes

- README.md: new "What's new in v0.5.3" section listing all fixes,
calibration audit table (panel n=22), and paper §5.2 erratum reference.
- docs/hf-post-v053-fix.md: public-facing post for HF Space community,
with model-by-model impact table, what was fixed / not affected,
and verification instructions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Files changed (2) hide show

README.md +72 -0
docs/hf-post-v053-fix.md +160 -0

README.md CHANGED Viewed

@@ -211,6 +211,78 @@ detect anomalous checkpoints.
 ---
 ## What's new in v0.5 (2026-05-01) — 🔬 Machine-verified consistency
 **First transformer-attention framework with formal machine-proof backing.**

 ---
+## What's new in v0.5.3 (2026-05-02) — 🔧 Audit-driven bug fixes
+The TAF Agent was **applied to its own author's paper** (recursive Sócrates audit)
+and to the agent's own formula implementations. Several real bugs were detected
+and corrected. **All v0.5.0–v0.5.2 users running diagnostics on Phase B models
+(γ > 1: LLaMA-2/3, Mistral, Gemma, Qwen2.5-7B near-Hagedorn) received
+incorrect KV-compression recommendations.** This release fixes all known issues.
+### Critical fixes
+- **`D_f_closed` (KV compression window)**: replaced asymptotic / Hagedorn-buffer
+  branches with **discrete cumulative sum**. Old code clamped Phase B (γ>1) to
+  N when truth was ~3 % of N (LLaMA-3-8B at γ=1.046 with N=2000 should compress
+  to ~750 tokens; old code returned 2000). Boundary γ ∈ [0.99, 1.01] was off
+  by factor ~2×. Now exact for any γ.
+- **`partition_Z(γ=1, N)`**: was `log(N + 0.5)`, missing Euler-Mascheroni
+  constant γ_E ≈ 0.577 (~7 % underestimate of H_N). Now `log(N) + γ_E`.
+- **`free_energy_F`**: returned `−log(Z)` (β·F convention). Now `−log(Z)/γ`,
+  consistent with the Helmholtz definition F = −T·log(Z) and the
+  thermodynamic identity S = γ·(U − F).
+- **`γ_pred`**: replaced obsolete `C/lnθ` heuristic with `γ_Padé(θ, T_eval)`
+  (paper §3.3).
+### Calibration audit (cross-panel re-check, n=22)
+Re-running the empirical δ corrections of `gamma_decompose` against the
+panel revealed:
+| Constant | Hardcoded | Panel re-audit | Verdict |
+|---|---|---|---|
+| δ_GQA | +0.11 | +0.115 | ✓ replicates |
+| δ_SWA | −0.21 | originally fit on **n=1 model** | ✗ disabled (insufficient data) |
+| δ_post_IH | −0.15 | group-mean ≈ 0 (n=16 yes / 6 no) | ⚠ flagged exploratory |
+| δ_instruct (v2) | −0.10 | n=3, p=0.06 (already noted) | ⚠ flagged exploratory |
+`gamma_decompose` and `gamma_decompose_v2` now return per-axis status fields
+(`delta_SWA_status`, `delta_post_IH_status`, etc.) and a top-level
+`calibration_warning` so consumers can detect which corrections are reliable.
+The TAF Card UI now displays a collapsible **"v0.5.3 — Calibration audit"
+banner** in all four supported languages (EN/ES/FR/ZH) explaining this.
+### Paper §5.2 erratum
+The framework's **own self-audit** found that paper §5.2 Theorem 5.2 claims
+`C_V(γ=1, N) = (log N)²/4`. Sócrates triangulation (numerical Python +
+Sage exact rational + SymPy symbolic integral) confirms the correct
+asymptotic is `(log N)²/12` — a factor-3 error in the paper's truncated
+Z-expansion proof. The agent's `heat_capacity_Cv` already computes the
+correct value via numerical derivative of U; **only the paper's analytic
+formula is wrong, not the tool**. A formal erratum will be published as a
+separate document.
+### Tests
+22/22 unit tests pass (`tests/test_taf_formulas.py`), including regression
+tests for D_f Phase B, partition_Z γ_E, free_energy_F convention, and
+δ_SWA disabled.
+### Why this happened
+These bugs survived prior reviews because the affected code paths were
+exercised mainly on Phase A models (γ < 0.95) where the asymptotic
+approximation is close enough. Phase B (γ > 1) and the boundary near
+Hagedorn (|γ−1| < 0.05) were under-tested. The agent now uses direct
+discrete computation, so accuracy is uniform across all γ.
+---
 ## What's new in v0.5 (2026-05-01) — 🔬 Machine-verified consistency
 **First transformer-attention framework with formal machine-proof backing.**

docs/hf-post-v053-fix.md ADDED Viewed

	@@ -0,0 +1,160 @@

+# 🔧 TAF Agent v0.5.3 — Audit-driven bug fixes
+**TL;DR** — If you ran the TAF Agent on a model with γ > 1 (LLaMA-2/3,
+Mistral, Gemma, near-Hagedorn Qwen) before today, the KV-compression
+recommendation (`D_f`) was probably wrong. The agent has been corrected
+end-to-end. Re-run your diagnostics.
+---
+## What was wrong
+The agent applies a self-audit to its own paper. Yesterday I turned the
+audit on the agent itself. It found six issues. Three are critical
+enough that I'm posting publicly.
+### 1. `D_f_closed` Phase B (γ>1) — wrong by 30–95 %
+For models with γ > 1 (Phase B, where attention is locally concentrated)
+the old asymptotic formula clamped to the full context length N, when
+the true compression target is ~3–10 % of N.
+| Model | γ | Old `D_f` (N=2000, f=0.9) | Correct `D_f` |
+|---|---|---|---|
+| LLaMA-2-7B | 1.026 | 2000 (clamped) | ~830 |
+| LLaMA-3-8B | 1.046 | 2000 (clamped) | ~750 |
+| Gemma-2-9B random | 1.135 | 2000 (clamped) | ~610 |
+| (γ = 1.5 stress test) | 1.500 | 2000 (clamped) | ~44 |
+If you used the agent's compression suggestion for any of these, you
+were leaving real memory savings on the table.
+### 2. Hagedorn buffer (|γ − 1| < 0.01) — factor 2× off
+Models living right at the phase boundary (Qwen2.5-7B at γ=0.997, etc.)
+hit a hardcoded special case `N · f^(1/log N)` instead of the correct
+`N^f`. Off by ~2×.
+### 3. `δ_SWA = −0.21` calibration — fit on n = 1 model
+The architectural decomposition `gamma_decompose` carried a SWA
+correction of −0.21 derived from a single Sliding-Window-Attention
+model in the panel. With n = 1 you cannot estimate a coefficient; the
+constant was effectively arbitrary. **Now disabled** with explicit
+status flag `delta_SWA_status: 'exploratory_n1_disabled'`.
+`δ_post_IH = −0.15` and `δ_instruct = −0.10` did not replicate cleanly
+on the panel re-audit either; both now carry `exploratory` flags.
+Only `δ_GQA = +0.11` (panel-mean +0.115) replicates. **The most
+reliable axes are now `δ_GQA` and the `ν_imprint` slope.**
+---
+## What was fixed
+- `D_f_closed` rewritten to use **direct discrete cumulative sum** —
+  exact for any γ, no asymptotics, no buffers. ~10 ms per call for
+  N ≤ 10⁶.
+- `partition_Z(γ=1, N)` now adds the Euler-Mascheroni constant
+  (~7 % accuracy fix on H_N).
+- `free_energy_F` switched to physics convention `F = −log(Z)/γ`,
+  consistent with `S = γ·(U − F)`.
+- `γ_pred` now uses `γ_Padé(θ, T_eval)` instead of the obsolete
+  `C/lnθ` heuristic.
+- `gamma_decompose` and `gamma_decompose_v2` return per-axis
+  reliability flags + a top-level `calibration_warning`.
+- TAF Card UI shows a **collapsible "v0.5.3 — Calibration audit"
+  banner in all four supported languages** (EN / ES / FR / ZH).
+- 22 unit tests added (`tests/test_taf_formulas.py`), all passing.
+---
+## What was *not* affected
+These formulas were verified independently and remain correct:
+- `gamma_pade`, `theta_design`, `alpha_opt`, `theta_eff_pade`
+- `mean_log_d`, `entropy_S` (the new `F` convention adjusts but the
+  identity `S = γ·(U − F)` is preserved)
+- `heat_capacity_Cv` — numerical derivative of `mean_log_d`,
+  computes the correct value automatically (the **paper §5.2 analytic
+  formula `(log N)²/4` is wrong** but the agent never used it; agent
+  computes via finite difference and gets the correct asymptotic
+  `(log N)²/12`)
+- `d_horizon`, `L_NIAH^c`, `χ`, `T_attn`
+- `gamma_random_predict`, `compute_invariant_K`, `ih_phase_check`
+- All the verified algebraic identities (D-SAGE-1 through 7)
+---
+## Paper §5.2 erratum (separate)
+While auditing the agent, the framework also caught an algebraic error
+in the companion paper. Paper §5.2 Theorem 5.2 claims:
+```
+C_V(γ = 1, N) = (log N)² / 4
+```
+Triple triangulation (Sócrates numerical + Sage exact rational + SymPy
+symbolic integration) shows the correct asymptotic is:
+```
+C_V(γ = 1, N) → (log N)² / 12   (large N)
+```
+The proof in the paper truncated Z(γ, N) at first order in (1−γ),
+missing a (1−γ)²·(log N)²/6 term. A formal erratum is in preparation
+and will be published as a separate document.
+This does not affect any of the agent's numerical outputs — the agent
+computes `C_V` via numerical derivative, not the buggy analytic form.
+It only affects the analytic claim in the paper.
+---
+## How to verify
+```bash
+git clone https://github.com/karlesmarin/tafagent
+cd tafagent
+pytest tests/test_taf_formulas.py    # 22/22 should pass
+```
+Or just open the live Space — the calibration banner will show up
+immediately at the top of any TAF Card output.
+---
+## Why I'm telling you this
+If you used a tool's recommendation to change a real production setup
+(KV cache size, RoPE scaling, model selection) and the tool was wrong,
+you deserve to know. That's the point of "auditable, deterministic,
+in-browser" — not just that it's transparent in the abstract, but that
+when a bug is found it gets reported. Today there's a bug to report.
+The audit framework that found these is itself in early development
+(Sócrates v0.1, internal use). The fact that it caught real issues in
+its own author's published paper and shipped tool is, honestly, the
+strongest validation it has so far.
+If you spot anything else wrong — please open an issue.
+— Carles Marín
+*Independent researcher*
+*2026-05-02*
+---
+**Links**:
+- Live: https://huggingface.co/spaces/karlexmarin/taf-agent
+- Source: https://github.com/karlesmarin/tafagent
+- Paper: https://zenodo.org/records/19826343
+- Dataset: https://huggingface.co/datasets/karlexmarin/taf-attention-decay