Livnium β Energy-Guided Attractor Network (EGAN) for NLI
NLI classifier on SNLI where inference is not a single forward pass β it is a sequence of geometry-aware state updates before the final readout.
π€ Model on HuggingFace: chetanxpatil/livnium-snli
Directory Structure
livnium/
βββ pretrained/
β βββ collapse4/
β βββ quantum_embeddings_final.pt β pre-trained word embeddings
βββ runs/
β βββ triple_crown_slow/
β βββ best_model.pt β β BEST MODEL: 76.32% dev acc
β βββ test_errors.jsonl β misclassified examples
βββ system/
βββ snli/
βββ model/ β main model: train + eval
β βββ train.py β TRAIN
β βββ eval.py β EVAL
β βββ core/ β VectorCollapseEngine, BasinField, physics_laws
β βββ tasks/snli/ β PretrainedSNLIEncoder, SNLIHead
β βββ text/ β text encoders (vocab-based)
β βββ utils/ β vocab helpers
βββ embed/ β pretrained embedding module
βββ text_encoder.py β PretrainedTextEncoder
βββ collapse_engine.py β lightweight collapse for embed module
βββ basin_field.py β basin field dynamics
The Update Rule
At each collapse step t = 0β¦L-1:
h_{t+1} = h_t
+ Ξ΄_ΞΈ(h_t) β learned residual
- s_y Β· D(h_t, A_y) Β· nΜ(h_t, A_y) β anchor force (training: correct label only)
- Ξ² Β· B(h_t) Β· nΜ(h_t, A_N) β neutral boundary force
D(h, A) = 0.38 β cos(h, A) β divergence from equilibrium cosine
nΜ(h, A) = (h β A) / βh β Aβ β Euclidean radial direction
B(h) = 1 β |cos(h,A_E) β cos(h,A_C)| β EβC boundary proximity
Three learned anchor vectors A_E, A_C, A_N define the label geometry. The attractor is a ring at cos(h, A_y) = 0.38, not the anchor itself.
At inference all three anchors compete simultaneously β whichever basin has the strongest geometric pull wins.
Force magnitudes are cosine-based; force directions are Euclidean radial. These are geometrically inconsistent (true cosine gradient is tangential). Correct description: discrete-time attractor dynamics with anchor-directed forces. Energy-like, not exact gradient flow.
Data
SNLI (for training/eval the classifier):
wget https://nlp.stanford.edu/projects/snli/snli_1.0.zip
unzip snli_1.0.zip -d data/snli/
Or via HuggingFace: datasets.load_dataset("snli")
Homepage: https://nlp.stanford.edu/projects/snli/
WikiText-103 (for training the pretrained embeddings):
wget https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-103-v1.zip
unzip wikitext-103-v1.zip -d data/wikitext/
Or via HuggingFace: datasets.load_dataset("wikitext", "wikitext-103-v1")
Or via Kaggle: https://www.kaggle.com/datasets/vadimkurochkin/wikitext-103
Homepage: https://huggingface.co/datasets/Salesforce/wikitext
How to Train
cd system/snli/model/
python3 train.py \
--snli-train ../../../data/snli/snli_1.0_train.jsonl \
--snli-dev ../../../data/snli/snli_1.0_dev.jsonl \
--encoder-type pretrained \
--embed-ckpt ../../../pretrained/collapse4/quantum_embeddings_final.pt \
--output-dir ../../../runs/my_run \
--dim 256 --num-layers 6 \
--epochs 10 --batch-size 32 --lr 0.001 \
--lambda-traj 0.1 --lambda-fn 0.15 --lambda-rep 0.1 --margin-rep 0.3 \
--adaptive-metric
How to Eval
cd system/snli/model/
python3 eval.py \
--model-dir ../../../runs/triple_crown_slow \
--snli-test ../../../data/snli/snli_1.0_dev.jsonl \
--batch-size 256
What's Novel
Most classifiers do: h β linear layer β logits. One step, no dynamics.
Livnium does: hβ β L steps of geometry-aware state evolution β logits. The final state h_L is dynamically shaped before readout β it isn't just a linear projection of hβ.
The specific things that are different:
1. Classification as attractor dynamics, not a lookup.
The state h moves through space across L steps under anchor forces before the classifier reads it. The label isn't computed from the raw embedding β it's read from where the state settled.
2. The force geometry is deliberately inconsistent β and that's measured.
Force magnitudes follow cosine divergence D(h, A) = 0.38 β cos(h, A). Force directions are Euclidean radial nΜ = (h β A) / βh β Aβ. These are not the same thing β the true gradient of a cosine energy is tangential on the sphere, not radial. The mean angle between these two directions is 135.2Β° Β± 2.5Β° (measured, n=1000). This means the system is running explicit physical forces, not gradient descent on the written energy.
3. The attractor is a ring, not a point.
The equilibrium condition is cos(h, A_y) = 0.38, which defines a ring on the hypersphere β not the anchor itself. The system settles to a proximity zone, not a target location. Standard energy minimisation would push to the anchor; this stops at the ring.
4. Proven local contraction.
V(h) = (0.38 β cos(h, A_y))Β² is a Lyapunov function that decreases at every step when Ξ΄_ΞΈ = 0 (proven analytically, confirmed empirically on 5000 samples). Livnium is a provably locally-contracting pseudo-gradient flow. Most residual classifiers have no such stability guarantee.
5. Inference is a single unsupervised collapse.
Training uses s_y Β· D(h, A_y) β only the correct anchor pulls. At inference, all three anchors compete with no label. The label is implicit in which basin wins. Cost: 1Γ forward pass through a small MLP, 428Γ faster than BERT on CPU.
What it isn't: global convergence is not proven (finite step size + learned residual Ξ΄_ΞΈ can escape the basin). The geometric inconsistency is not fixed. It isn't yet competitive with fine-tuned transformers on accuracy. Whether iterated attractor dynamics outperform a standard deep residual block at equivalent parameter count is an open question.
Results β SNLI NLI Classification
Accuracy (SNLI dev set)
| Class | Accuracy |
|---|---|
| Overall | 76.32% |
| Entailment | 87.5% |
| Contradiction | 81.2% |
| Neutral | 70.9% |
Model Config
| Parameter | Value |
|---|---|
| Dim | 256 |
| Collapse layers | 6 |
| Encoder | Pretrained bag-of-words embeddings (frozen) |
| Parameters | ~2M |
Speed vs BERT (CPU, batch size 32)
| Model | ms / batch | Samples / sec | Full SNLI train (549k) |
|---|---|---|---|
| Livnium | 0.4 ms | 85,335 / sec | ~6 sec |
| BERT-base | 171 ms | 187 / sec | ~49 min |
428Γ faster than BERT-base on CPU.
Lyapunov Analysis
Define V(h) = D(h, A_y)Β² = (0.38 β cos(h, A_y))Β²
V = 0 at the attractor ring. When Ξ΄_ΞΈ = 0, V decreases at every step (mean ΞV = β0.00131). Analytically:
β_h cos Β· nΜ = β(Ξ² Β· sinΒ²ΞΈ) / (Ξ± Β· βh β Aβ) β€ 0
Livnium is a provably locally-contracting pseudo-gradient flow.
See runs/livnium_collapse_equation.md for the full derivation and empirical direction mismatch analysis (135.2Β° Β± 2.5Β° between Euclidean and cosine gradients).
Citation
If you use this work in your research, please cite:
@misc{patil2026livnium,
author = {Patil, Chetan},
title = {Livnium: Energy-Guided Attractor Network (EGAN) for Natural Language Inference},
year = {2026},
publisher = {GitHub},
howpublished = {\url{https://github.com/chetanxpatil/livnium}},
note = {Model available at \url{https://huggingface.co/chetanxpatil/livnium-snli}}
}
For questions or collaboration: GitHub Β· HuggingFace
- Downloads last month
- 19
Dataset used to train chetanxpatil/livnium-snli
Evaluation results
- Dev Accuracy on SNLIself-reported0.763
- Entailment Accuracy on SNLIself-reported0.875
- Contradiction Accuracy on SNLIself-reported0.812
- Neutral Accuracy on SNLIself-reported0.709
- ms per batch (32 samples, CPU) on SNLIself-reported0.400
- Samples per second (CPU) on SNLIself-reported85335.000