Livnium β€” Energy-Guided Attractor Network (EGAN) for NLI

NLI classifier on SNLI where inference is not a single forward pass β€” it is a sequence of geometry-aware state updates before the final readout.

πŸ€— Model on HuggingFace: chetanxpatil/livnium-snli

Directory Structure

livnium/
β”œβ”€β”€ pretrained/
β”‚   └── collapse4/
β”‚       └── quantum_embeddings_final.pt  ← pre-trained word embeddings
β”œβ”€β”€ runs/
β”‚   └── triple_crown_slow/
β”‚       β”œβ”€β”€ best_model.pt                ← ⭐ BEST MODEL: 76.32% dev acc
β”‚       └── test_errors.jsonl            ← misclassified examples
└── system/
    └── snli/
        β”œβ”€β”€ model/                        ← main model: train + eval
        β”‚   β”œβ”€β”€ train.py                  ← TRAIN
        β”‚   β”œβ”€β”€ eval.py                   ← EVAL
        β”‚   β”œβ”€β”€ core/                     ← VectorCollapseEngine, BasinField, physics_laws
        β”‚   β”œβ”€β”€ tasks/snli/               ← PretrainedSNLIEncoder, SNLIHead
        β”‚   β”œβ”€β”€ text/                     ← text encoders (vocab-based)
        β”‚   └── utils/                    ← vocab helpers
        └── embed/                        ← pretrained embedding module
            β”œβ”€β”€ text_encoder.py           ← PretrainedTextEncoder
            β”œβ”€β”€ collapse_engine.py        ← lightweight collapse for embed module
            └── basin_field.py            ← basin field dynamics

The Update Rule

At each collapse step t = 0…L-1:

h_{t+1} = h_t
         + Ξ΄_ΞΈ(h_t)                              ← learned residual
         - s_y Β· D(h_t, A_y) Β· nΜ‚(h_t, A_y)      ← anchor force (training: correct label only)
         - Ξ² Β· B(h_t) Β· nΜ‚(h_t, A_N)              ← neutral boundary force
D(h, A)  = 0.38 βˆ’ cos(h, A)               ← divergence from equilibrium cosine
nΜ‚(h, A) = (h βˆ’ A) / β€–h βˆ’ Aβ€–              ← Euclidean radial direction
B(h)     = 1 βˆ’ |cos(h,A_E) βˆ’ cos(h,A_C)|  ← E–C boundary proximity

Three learned anchor vectors A_E, A_C, A_N define the label geometry. The attractor is a ring at cos(h, A_y) = 0.38, not the anchor itself.

At inference all three anchors compete simultaneously β€” whichever basin has the strongest geometric pull wins.

Force magnitudes are cosine-based; force directions are Euclidean radial. These are geometrically inconsistent (true cosine gradient is tangential). Correct description: discrete-time attractor dynamics with anchor-directed forces. Energy-like, not exact gradient flow.


Data

SNLI (for training/eval the classifier):

wget https://nlp.stanford.edu/projects/snli/snli_1.0.zip
unzip snli_1.0.zip -d data/snli/

Or via HuggingFace: datasets.load_dataset("snli") Homepage: https://nlp.stanford.edu/projects/snli/

WikiText-103 (for training the pretrained embeddings):

wget https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-103-v1.zip
unzip wikitext-103-v1.zip -d data/wikitext/

Or via HuggingFace: datasets.load_dataset("wikitext", "wikitext-103-v1") Or via Kaggle: https://www.kaggle.com/datasets/vadimkurochkin/wikitext-103 Homepage: https://huggingface.co/datasets/Salesforce/wikitext


How to Train

cd system/snli/model/

python3 train.py \
  --snli-train ../../../data/snli/snli_1.0_train.jsonl \
  --snli-dev   ../../../data/snli/snli_1.0_dev.jsonl \
  --encoder-type pretrained \
  --embed-ckpt ../../../pretrained/collapse4/quantum_embeddings_final.pt \
  --output-dir ../../../runs/my_run \
  --dim 256 --num-layers 6 \
  --epochs 10 --batch-size 32 --lr 0.001 \
  --lambda-traj 0.1 --lambda-fn 0.15 --lambda-rep 0.1 --margin-rep 0.3 \
  --adaptive-metric

How to Eval

cd system/snli/model/

python3 eval.py \
  --model-dir ../../../runs/triple_crown_slow \
  --snli-test ../../../data/snli/snli_1.0_dev.jsonl \
  --batch-size 256

What's Novel

Most classifiers do: h β†’ linear layer β†’ logits. One step, no dynamics.

Livnium does: hβ‚€ β†’ L steps of geometry-aware state evolution β†’ logits. The final state h_L is dynamically shaped before readout β€” it isn't just a linear projection of hβ‚€.

The specific things that are different:

1. Classification as attractor dynamics, not a lookup. The state h moves through space across L steps under anchor forces before the classifier reads it. The label isn't computed from the raw embedding β€” it's read from where the state settled.

2. The force geometry is deliberately inconsistent β€” and that's measured. Force magnitudes follow cosine divergence D(h, A) = 0.38 βˆ’ cos(h, A). Force directions are Euclidean radial nΜ‚ = (h βˆ’ A) / β€–h βˆ’ Aβ€–. These are not the same thing β€” the true gradient of a cosine energy is tangential on the sphere, not radial. The mean angle between these two directions is 135.2Β° Β± 2.5Β° (measured, n=1000). This means the system is running explicit physical forces, not gradient descent on the written energy.

3. The attractor is a ring, not a point. The equilibrium condition is cos(h, A_y) = 0.38, which defines a ring on the hypersphere β€” not the anchor itself. The system settles to a proximity zone, not a target location. Standard energy minimisation would push to the anchor; this stops at the ring.

4. Proven local contraction. V(h) = (0.38 βˆ’ cos(h, A_y))Β² is a Lyapunov function that decreases at every step when Ξ΄_ΞΈ = 0 (proven analytically, confirmed empirically on 5000 samples). Livnium is a provably locally-contracting pseudo-gradient flow. Most residual classifiers have no such stability guarantee.

5. Inference is a single unsupervised collapse. Training uses s_y Β· D(h, A_y) β€” only the correct anchor pulls. At inference, all three anchors compete with no label. The label is implicit in which basin wins. Cost: 1Γ— forward pass through a small MLP, 428Γ— faster than BERT on CPU.

What it isn't: global convergence is not proven (finite step size + learned residual Ξ΄_ΞΈ can escape the basin). The geometric inconsistency is not fixed. It isn't yet competitive with fine-tuned transformers on accuracy. Whether iterated attractor dynamics outperform a standard deep residual block at equivalent parameter count is an open question.


Results β€” SNLI NLI Classification

Accuracy (SNLI dev set)

Class Accuracy
Overall 76.32%
Entailment 87.5%
Contradiction 81.2%
Neutral 70.9%

Model Config

Parameter Value
Dim 256
Collapse layers 6
Encoder Pretrained bag-of-words embeddings (frozen)
Parameters ~2M

Speed vs BERT (CPU, batch size 32)

Model ms / batch Samples / sec Full SNLI train (549k)
Livnium 0.4 ms 85,335 / sec ~6 sec
BERT-base 171 ms 187 / sec ~49 min

428Γ— faster than BERT-base on CPU.


Lyapunov Analysis

Define V(h) = D(h, A_y)Β² = (0.38 βˆ’ cos(h, A_y))Β²

V = 0 at the attractor ring. When Ξ΄_ΞΈ = 0, V decreases at every step (mean Ξ”V = βˆ’0.00131). Analytically:

βˆ‡_h cos Β· nΜ‚ = βˆ’(Ξ² Β· sinΒ²ΞΈ) / (Ξ± Β· β€–h βˆ’ Aβ€–)  ≀ 0

Livnium is a provably locally-contracting pseudo-gradient flow.

See runs/livnium_collapse_equation.md for the full derivation and empirical direction mismatch analysis (135.2Β° Β± 2.5Β° between Euclidean and cosine gradients).


Citation

If you use this work in your research, please cite:

@misc{patil2026livnium,
  author       = {Patil, Chetan},
  title        = {Livnium: Energy-Guided Attractor Network (EGAN) for Natural Language Inference},
  year         = {2026},
  publisher    = {GitHub},
  howpublished = {\url{https://github.com/chetanxpatil/livnium}},
  note         = {Model available at \url{https://huggingface.co/chetanxpatil/livnium-snli}}
}

For questions or collaboration: GitHub Β· HuggingFace

Downloads last month
19
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train chetanxpatil/livnium-snli

Evaluation results