MESA: Macular Ephemeral State-space Architecture (Hypernetwork Weights)

These are the officially trained Hypernetwork weights for the MESA Architecture, designed as an $O(1)$ KV-Cache scalable successor to Doc-to-LoRA.

Instead of storing context in memory activations (KV-cache), MESA uses a System-1 linear skimmer and a Deep Weight Programmer (Hypernetwork) to instantly compile textual context into transient LoRA matrices in a single forward pass.

This specific checkpoint (mesa_kl_ep2.pt) was trained using a Dual-Objective KL-Divergence Generative Regularization loss to solve the Stability-Plasticity dilemma. It successfully drops $D_{KL}$ to $0.0588$, allowing the base model to preserve mathematically flawless English grammar during zero-shot weight injection.

📚 Links & Resources

Official Academic Paper (Zenodo):[https://doi.org/10.5281/zenodo.18973735]
Source Code & Pipeline (GitHub): [https://github.com/ElvianElvy/MESA-Doc-to-LoRA-Successor.git]

⚙️ How to Use

These are not standard LLM weights. These are the weights for the MESA DeepWeightProgrammer module, which dynamically generates LoRA matrices for Qwen/Qwen2.5-0.5B-Instruct.

To use these weights, you must run the MESA pipeline from our GitHub repository:

from mesa_pipeline import MESAPipeline

mesa = MESAPipeline()
mesa.hypernet.load_state_dict(torch.load("mesa_kl_ep2.pt"))

# Generate dynamic LoRA weights from context
doc_emb = mesa.get_document_embedding("Your massive context here...")
Wa, Wb = mesa.hypernet(doc_emb)

# Inject into base LLM
mesa.inject(Wa, Wb, scaling_factor=2.0)

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for ElvianElvy/MESA-Qwen2.5-0.5B-KL-Regularized

Base model

Qwen/Qwen2.5-0.5B

Finetuned

Qwen/Qwen2.5-0.5B-Instruct

Adapter

(504)

this model