MESA: Macular Ephemeral State-space Architecture (Hypernetwork Weights)
These are the officially trained Hypernetwork weights for the MESA Architecture, designed as an $O(1)$ KV-Cache scalable successor to Doc-to-LoRA.
Instead of storing context in memory activations (KV-cache), MESA uses a System-1 linear skimmer and a Deep Weight Programmer (Hypernetwork) to instantly compile textual context into transient LoRA matrices in a single forward pass.
This specific checkpoint (mesa_kl_ep2.pt) was trained using a Dual-Objective KL-Divergence Generative Regularization loss to solve the Stability-Plasticity dilemma. It successfully drops $D_{KL}$ to $0.0588$, allowing the base model to preserve mathematically flawless English grammar during zero-shot weight injection.
π Links & Resources
- Official Academic Paper (Zenodo):[https://doi.org/10.5281/zenodo.18973735]
- Source Code & Pipeline (GitHub): [https://github.com/ElvianElvy/MESA-Doc-to-LoRA-Successor.git]
βοΈ How to Use
These are not standard LLM weights. These are the weights for the MESA DeepWeightProgrammer module, which dynamically generates LoRA matrices for Qwen/Qwen2.5-0.5B-Instruct.
To use these weights, you must run the MESA pipeline from our GitHub repository:
from mesa_pipeline import MESAPipeline
mesa = MESAPipeline()
mesa.hypernet.load_state_dict(torch.load("mesa_kl_ep2.pt"))
# Generate dynamic LoRA weights from context
doc_emb = mesa.get_document_embedding("Your massive context here...")
Wa, Wb = mesa.hypernet(doc_emb)
# Inject into base LLM
mesa.inject(Wa, Wb, scaling_factor=2.0)