aria-llm / README.md

SofiTesfay2010

v0.4: control-theoretic rewrite, fixed loop detector, ablations, PCA, perplexity tracking

eacbc17 verified 9 days ago

6.01 kB

	# ARIA: Closed-Loop Reliability Control for Autoregressive Decoding

	> Runtime reliability middleware for LLM inference.

	ARIA is a lightweight, training-free control system that hooks into any HuggingFace Transformers model. It observes hidden states and logit distributions, detects anomalous behavior via calibrated statistical observers, and applies minimal corrective control inputs through proportional feedback — all via standard PyTorch forward hooks.

	## v0.4 — What Changed (and Why)

	v0.3 benchmark results were honest but damning: ARIA hurt GSM8K accuracy by 5% (90% → 85%) on Qwen3-8B-AWQ. The loop detector triggered on 65.6% of normal reasoning steps, and the trajectory diverger perturbed correct chains into incorrect ones.

	v0.4 fixes every identified issue:

	\| Problem (v0.3) \| Root Cause \| Fix (v0.4) \|
	\|---\|---\|---\|
	\| Loop detector over-fires on reasoning \| Entropy variance collapse ≠ content repetition \| Content-aware detection: token trigram repetition ratio \|
	\| Trigger threshold too low \| severity > 0.5 caught normal variation \| Raised to 0.7 across all observers \|
	\| No ablation evidence \| Couldn't prove each observer matters \| Built-in ablation mode: disable any observer independently \|
	\| No stability evidence \| No proof ARIA preserves output distribution \| Perplexity tracking: log P(top-1) with/without corrections \|
	\| "Heuristic soup" criticism \| No unifying framework \| Control-theoretic framing: observer → controller → plant \|
	\| No orthogonality evidence \| Failure modes might be correlated \| Signal vector logging + PCA correlation matrix \|

	## Framing: Proportional Feedback Control

	Following [A-LQR (arxiv:2604.19018)](https://arxiv.org/abs/2604.19018), we model the LLM as a dynamical system:

	```
	Plant: z_{k+1} = φ_k(z_k) + u_k (transformer dynamics + control input)
	Observer: s_k = [σ_compound, σ_drift, σ_loop, σ_median] (4-output state estimator)
	Controller: u_k = -K · max(σ) · v_k (proportional feedback)
	```

	Where:
	- `z_k ∈ ℝ^d` = hidden state at layer k (intervention at layer ℓ/2)
	- `φ_k` = frozen transformer block
	- `s_k` = observer output (calibrated severity per failure mode)
	- `K` = auto-tuned proportional gain (from calibration variance)
	- `v_k` = correction direction (EMA/anchor/orthogonal depending on failure mode)

	Honest difference from A-LQR: We use proportional control (P-controller) instead of LQR because we don't compute per-layer Jacobians (too expensive for middleware). A-LQR is optimal for single objectives; ARIA trades optimality for multi-objective coverage with zero setup cost.

	## Observers (Detectors)

	\| Observer \| Signal \| Calibration \| Trigger \| Paper Basis \|
	\|---\|---\|---\|---\|---\|
	\| Compound Error \| JSD(p_t, p_{t-1}) + H_norm(p_t) \| mean + 2.5σ \| severity > 0.7 \| [arxiv:2602.02863](https://arxiv.org/abs/2602.02863) \|
	\| Semantic Drift \| 1 - cos(h_t, h_0) \| mean + 2.5σ \| severity > 0.7 \| [CAST arxiv:2409.05907](https://arxiv.org/abs/2409.05907) \|
	\| Logic Loop \| Trigram repetition ratio (v0.4) \| mean + 2.5σ \| severity > 0.7 \| Content-aware, not entropy-based \|
	\| Median Trap \| top-1 prob + inverse top-K entropy + TTR \| mean + 2.5σ \| severity > 0.7, 2/3 agree \| [ITI](https://arxiv.org/abs/2306.03341) \|

	## Controllers (Correctors)

	\| Controller \| Control Law \| When \|
	\|---\|---\|---\|
	\| Steering \| u = K · σ · (EMA - h) / ‖EMA - h‖ · ‖h‖ \| Compound error detected \|
	\| Goal Anchor \| u = K · σ · (h_0 - h) \| Semantic drift detected \|
	\| Divergence \| u = K · σ · v_⊥ · ‖h‖ (Gram-Schmidt orthogonal) \| Logic loop detected \|
	\| Logit Temp \| logits /= (1 + 0.15·K·σ), top-3 suppressed \| Median trap detected \|

	Budget: max 1 correction per step. Highest severity wins.

	## Benchmark Results

	### Qwen3-8B-AWQ on T4 (v0.3 results — v0.4 pending your Colab run)

	\| Config \| GSM8K (20) \| Code (10) \| Corrections \| Loop triggers \|
	\|---\|---\|---\|---\|---\|
	\| Baseline \| 90.0% \| 100.0% \| 0 \| — \|
	\| ARIA v0.3 \| 85.0% ❌ \| 100.0% \| 6,724 \| 336/512 (65.6%) \|
	\| ARIA v0.4 \| TBD \| TBD \| TBD \| TBD \|

	v0.4's trigram-based loop detector should dramatically reduce false positives.

	### Ablation Study (built into v0.4 script)

	The script automatically runs 5 ablation configs:
	1. Full ARIA (all observers)
	2. No compound error observer
	3. No semantic drift observer
	4. No logic loop observer
	5. No median trap observer
	6. Observe-only (no corrections)

	## What This Is (Honestly)

	ARIA is runtime reliability middleware. Not SOTA on any single task. Not a replacement for better training. Not magic.

	It's the LLM equivalent of TCP checksums or PID controllers — imperfect components + a correction layer = better compound reliability. The math is `P_s = ∏(R_base + ΔR_i)` instead of `P_s = R^n`.

	### Current status (April 2026)

	\| Aspect \| Rating \| Evidence \|
	\|---\|---\|---\|
	\| Concept novelty \| 8/10 \| No other paper combines 4-mode detection + budget-limited correction \|
	\| Current evidence \| 5.5/10 → TBD \| v0.3 hurt GSM8K; v0.4 fixes identified; awaiting Colab results \|
	\| As systems paper \| 7.5/10 \| Control-theoretic framing + ablations would strengthen \|
	\| "SOTA" claim \| No \| A-LQR beats us on single objectives \|

	### What would make it strong

	1. ✅ Control-theoretic framing (v0.4)
	2. ✅ Ablation study (v0.4)
	3. ✅ Perplexity preservation measurement (v0.4)
	4. ✅ PCA orthogonality analysis (v0.4)
	5. ⬜ Full benchmark suite (GSM8K-1319, MATH-500, HumanEval, TruthfulQA)
	6. ⬜ Head-to-head vs ITI, CAA, A-LQR
	7. ⬜ Formal Lyapunov stability proof

	## Install

	```bash
	pip install torch transformers
	git clone https://huggingface.co/SofiTesfay2010/aria-llm
	cd aria-llm
	pip install -e .
	```

	## Quick Start

	```python
	from aria_llm import ARIA, ARIAConfig

	aria = ARIA.attach(model, tokenizer, cs=20, sk=2.5, auto=True, verbose=True)
	output = model.generate(input_ids, max_new_tokens=500)
	print(aria.report_text())
	aria.detach()
	```

	## License

	Apache 2.0