ARIA: Closed-Loop Reliability Control for Autoregressive Decoding

Runtime reliability middleware for LLM inference.

ARIA is a lightweight, training-free control system that hooks into any HuggingFace Transformers model. It observes hidden states and logit distributions, detects anomalous behavior via calibrated statistical observers, and applies minimal corrective control inputs through proportional feedback — all via standard PyTorch forward hooks.

v0.4 — What Changed (and Why)

v0.3 benchmark results were honest but damning: ARIA hurt GSM8K accuracy by 5% (90% → 85%) on Qwen3-8B-AWQ. The loop detector triggered on 65.6% of normal reasoning steps, and the trajectory diverger perturbed correct chains into incorrect ones.

v0.4 fixes every identified issue:

Problem (v0.3)	Root Cause	Fix (v0.4)
Loop detector over-fires on reasoning	Entropy variance collapse ≠ content repetition	Content-aware detection: token trigram repetition ratio
Trigger threshold too low	severity > 0.5 caught normal variation	Raised to 0.7 across all observers
No ablation evidence	Couldn't prove each observer matters	Built-in ablation mode: disable any observer independently
No stability evidence	No proof ARIA preserves output distribution	Perplexity tracking: log P(top-1) with/without corrections
"Heuristic soup" criticism	No unifying framework	Control-theoretic framing: observer → controller → plant
No orthogonality evidence	Failure modes might be correlated	Signal vector logging + PCA correlation matrix

Framing: Proportional Feedback Control

Following A-LQR (arxiv:2604.19018), we model the LLM as a dynamical system:

Plant:      z_{k+1} = φ_k(z_k) + u_k      (transformer dynamics + control input)
Observer:   s_k = [σ_compound, σ_drift, σ_loop, σ_median]  (4-output state estimator)
Controller: u_k = -K · max(σ) · v_k         (proportional feedback)

Where:

z_k ∈ ℝ^d = hidden state at layer k (intervention at layer ℓ/2)
φ_k = frozen transformer block
s_k = observer output (calibrated severity per failure mode)
K = auto-tuned proportional gain (from calibration variance)
v_k = correction direction (EMA/anchor/orthogonal depending on failure mode)

Honest difference from A-LQR: We use proportional control (P-controller) instead of LQR because we don't compute per-layer Jacobians (too expensive for middleware). A-LQR is optimal for single objectives; ARIA trades optimality for multi-objective coverage with zero setup cost.

Observers (Detectors)

Observer	Signal	Calibration	Trigger	Paper Basis
Compound Error	JSD(p_t, p_{t-1}) + H_norm(p_t)	mean + 2.5σ	severity > 0.7	arxiv:2602.02863
Semantic Drift	1 - cos(h_t, h_0)	mean + 2.5σ	severity > 0.7	CAST arxiv:2409.05907
Logic Loop	Trigram repetition ratio (v0.4)	mean + 2.5σ	severity > 0.7	Content-aware, not entropy-based
Median Trap	top-1 prob + inverse top-K entropy + TTR	mean + 2.5σ	severity > 0.7, 2/3 agree	ITI

Controllers (Correctors)

Controller	Control Law	When
Steering	u = K · σ · (EMA - h) / ‖EMA - h‖ · ‖h‖	Compound error detected
Goal Anchor	u = K · σ · (h_0 - h)	Semantic drift detected
Divergence	u = K · σ · v_⊥ · ‖h‖ (Gram-Schmidt orthogonal)	Logic loop detected
Logit Temp	logits /= (1 + 0.15·K·σ), top-3 suppressed	Median trap detected

Budget: max 1 correction per step. Highest severity wins.

Benchmark Results

Qwen3-8B-AWQ on T4 (v0.3 results — v0.4 pending your Colab run)

Config	GSM8K (20)	Code (10)	Corrections	Loop triggers
Baseline	90.0%	100.0%	0	—
ARIA v0.3	85.0% ❌	100.0%	6,724	336/512 (65.6%)
ARIA v0.4	TBD	TBD	TBD	TBD

v0.4's trigram-based loop detector should dramatically reduce false positives.

Ablation Study (built into v0.4 script)

The script automatically runs 5 ablation configs:

Full ARIA (all observers)
No compound error observer
No semantic drift observer
No logic loop observer
No median trap observer
Observe-only (no corrections)

What This Is (Honestly)

ARIA is runtime reliability middleware. Not SOTA on any single task. Not a replacement for better training. Not magic.

It's the LLM equivalent of TCP checksums or PID controllers — imperfect components + a correction layer = better compound reliability. The math is P_s = ∏(R_base + ΔR_i) instead of P_s = R^n.

Current status (April 2026)

Aspect	Rating	Evidence
Concept novelty	8/10	No other paper combines 4-mode detection + budget-limited correction
Current evidence	5.5/10 → TBD	v0.3 hurt GSM8K; v0.4 fixes identified; awaiting Colab results
As systems paper	7.5/10	Control-theoretic framing + ablations would strengthen
"SOTA" claim	No	A-LQR beats us on single objectives

What would make it strong

✅ Control-theoretic framing (v0.4)
✅ Ablation study (v0.4)
✅ Perplexity preservation measurement (v0.4)
✅ PCA orthogonality analysis (v0.4)
⬜ Full benchmark suite (GSM8K-1319, MATH-500, HumanEval, TruthfulQA)
⬜ Head-to-head vs ITI, CAA, A-LQR
⬜ Formal Lyapunov stability proof

Install

pip install torch transformers
git clone https://huggingface.co/SofiTesfay2010/aria-llm
cd aria-llm
pip install -e .

Quick Start

from aria_llm import ARIA, ARIAConfig

aria = ARIA.attach(model, tokenizer, cs=20, sk=2.5, auto=True, verbose=True)
output = model.generate(input_ids, max_new_tokens=500)
print(aria.report_text())
aria.detach()

License

Apache 2.0

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for SofiTesfay2010/aria-llm

Local Linearity of LLMs Enables Activation Steering via Model-Based Linear Optimal Control

Paper • 2604.19018 • Published 17 days ago

"I May Not Have Articulated Myself Clearly": Diagnosing Dynamic Instability in LLM Reasoning at Inference Time

Paper • 2602.02863 • Published Feb 2 • 1

Programming Refusal with Conditional Activation Steering

Paper • 2409.05907 • Published Sep 6, 2024

Inference-Time Intervention: Eliciting Truthful Answers from a Language Model

Paper • 2306.03341 • Published Jun 6, 2023