Stop Wasting LoRA Capacity: Spectral Auditing and Evidence-Based Compression

Community Article Published February 23, 2026

Claim: Spectral auditing reveals massive unused capacity in LoRA adapters, predicts safe compression targets, and validates them before deployment.

Evidence:

  • Mistral-7B / GSM8K: 50% rank reduction passed worst-seed safety policy across 3 seeds.
  • Compressed adapters outperformed the probe by 1.5–3.5 accuracy points (regularization effect).
  • DistilBERT / SST-2: 61% compression validated, confirming cross-scale transfer.

Artifact: gradience bench produces a PASS/FAIL report with per-seed accuracy, worst-case delta, and policy status — an auditable record of the compression decision.


You trained a LoRA adapter. It works. You shipped it.

But how much of that adapter is actually doing anything?

Most LoRA configurations are chosen by convention: r=16 because the tutorial used it, r=64 because "bigger is safer." The adapter trains, loss goes down, eval looks fine. Nobody checks whether the rank was right — because until recently, there was no fast way to check.

Loss fine-tuning has a measurement problem. Loss curves tell you the model is learning something. They don't tell you how it's learning, whether the capacity you allocated is being used, or what the structural relationship is between what your adapter learned and what the base model already knows.

Gradience checks this. One command, no dataset required, deterministic from adapter weights alone.

pip install gradience
gradience audit --peft-dir ./your-adapter --suggest-ranks

What You Get

GRADIENCE LoRA AUDIT
────────────────────────────────────────
PEFT dir: ./your-adapter
Allocated rank: 64

Summary:
  LoRA params:              589.8K
  Layers:                   32
  Stable rank (mean):       11.7
  Utilization (mean):       18.3%
  Energy rank k@90% (p50):  6
  Energy rank k@90% (p90):  14

Suggested ranks:
  Global median:  8
  Global p90:     16

That 18.3% utilization number is the headline. You allocated rank 64. The adapter is using the equivalent of rank 12. Over 80% of allocated rank appears unused.

What These Numbers Mean

  • Stable rank measures effective dimensionality — how many singular value directions carry meaningful energy. Formally: ||ΔW||²_F / ||ΔW||²_2. A rank-64 adapter with stable rank 12 has concentrated its learning into about 12 directions.
  • Utilization is stable rank divided by allocated rank. Below 0.25, you have substantial compression headroom. Above 0.6, the adapter is using most of what you gave it.
  • Energy rank k@90% counts how many singular values capture 90% of the update's total energy. The gap between the median layer's energy rank and the 90th-percentile layer's tells you whether compression should be uniform or per-layer.

The Evidence: Why Geometry Matters

Gradience grew out of a research program studying geometric signatures of training dynamics. In controlled experiments across architectures and scales, geometric features classified healthy versus pathological training regimes without error, whereas loss-based features achieved about 65%. The geometric signal isn't slightly better; it's categorically more informative.

If utilization were just noise—if stable rank and energy rank were numerical artifacts—compression based on those numbers would fail randomly. It doesn't.

We tested the audit's suggested compression targets on a Mistral-7B-v0.1 model trained on GSM8K (exact-match math reasoning) across three seeds.

Probe baseline (r=64): 28.5% ± 1.2% accuracy (range: 27.0%–30.0%)

Variant Compression Mean Accuracy Worst Δ Status
uniform_median (r=32) 50% 28.7% -2.5% ✅ PASS
uniform_p90 (r=32) 50% 30.0% -1.5% ✅ PASS
per_layer (audit-guided) 2.8% 32.0% -2.0% ✅ PASS

Three crucial observations:

  1. 50% parameter reduction held across all seeds. The audit's hypothesis was correct: half the allocated rank was unused, and removing it didn't hurt.
  2. Compressed adapters were more accurate than the probe. uniform_p90 gained 1.5 points; per_layer gained 3.5. Over-provisioned adapters have the capacity to fit noise in the training data. Constraining rank removes that capacity, and generalization improves.
  3. Cross-scale validation holds. The same methodology produced 61% compression on DistilBERT/SST-2, spanning two orders of magnitude in model size, different architectures, and different tasks.

Validating Before You Ship

Suggestions are hypotheses. Gradience Bench tests them:

gradience bench --config bench_config.yaml

The protocol retrains at suggested ranks across multiple seeds and applies a safety policy: worst-seed accuracy drop must stay within your tolerance (default: -2.5%). The output is a machine-readable artifact (bench.json) you can attach to a PR or model card.

If you want telemetry during training rather than just post-hoc:

from gradience.vnext.integrations.hf import GradienceCallback

callback = GradienceCallback(out_dir="./gradience_logs")
trainer = Trainer(..., callbacks=[callback])
trainer.train()

What Gradience Is Not

  • Not a predictor. Gradience measures structure and generates testable hypotheses. It doesn't predict absolute accuracy.
  • Not universal. 50% compression on GSM8K is a result, not a law. Always validate on your task.
  • Not a substitute for evaluation. Every compression recommendation passes through multi-seed evaluation before earning a PASS label. Seed variance is real, and single-seed results are unreliable for compression decisions.

What Comes Next

The spectral metrics that reveal capacity waste turn out to reveal more than just capacity. The exact same geometric lens that tells you "this adapter is using 12 of its 64 dimensions" can tell you things about how that adapter relates to other adapters.

When you merge two LoRA adapters, some merges work flawlessly, while others produce a model that has completely forgotten one of its tasks. In the next post, we'll show how computing subspace overlap before merging predicts whether one adapter will dominate the other—allowing you to audit your merges before you ever run them.


Glossary
Term Definition
Stable rank
Utilization stable rank / allocated rank — fraction of capacity in use
Energy rank k@90% number of singular values capturing 90% of update energy
Subspace overlap mean cos²(principal angles) between top-k subspaces of two adapters
Dominance (D) post-merge imbalance
Reproducibility & Links
Base model mistralai/Mistral-7B-v0.1
Task gsm8k/main (exact match)
Cross-scale distilbert-base-uncased / glue/sst2 (accuracy)
LoRA config r=64, alpha=64, target_modules=[q_proj, k_proj, v_proj, o_proj]
Seeds 42, 123, 456
Training 1200 steps, lr=5e-5, response-only masking
Code gradience bench --config bench_config.yaml
PyPI pypi.org/project/gradience/0.11.0
Repo github.com/johntnanney/gradience
License Apache 2.0
@software{gradience2026,
  title  = {Gradience: Spectral Analysis of Low-Rank Adaptation Dynamics},
  author = {Nanney, John T.},
  year   = {2026},
  url    = {https://github.com/johntnanney/gradience}
}

Community

Sign up or log in to comment