MESA: Macular Ephemeral State-space Architecture (Hypernetwork Weights)

These are the officially trained Hypernetwork weights for the MESA Architecture, designed as an $O(1)$ KV-Cache scalable successor to Doc-to-LoRA.

Instead of storing context in memory activations (KV-cache), MESA uses a System-1 linear skimmer and a Deep Weight Programmer (Hypernetwork) to instantly compile textual context into transient LoRA matrices in a single forward pass.

This specific checkpoint (mesa_kl_ep2.pt) was trained using a Dual-Objective KL-Divergence Generative Regularization loss to solve the Stability-Plasticity dilemma. It successfully drops $D_{KL}$ to $0.0588$, allowing the base model to preserve mathematically flawless English grammar during zero-shot weight injection.

πŸ“š Links & Resources

βš™οΈ How to Use

These are not standard LLM weights. These are the weights for the MESA DeepWeightProgrammer module, which dynamically generates LoRA matrices for Qwen/Qwen2.5-0.5B-Instruct.

To use these weights, you must run the MESA pipeline from our GitHub repository:

from mesa_pipeline import MESAPipeline

mesa = MESAPipeline()
mesa.hypernet.load_state_dict(torch.load("mesa_kl_ep2.pt"))

# Generate dynamic LoRA weights from context
doc_emb = mesa.get_document_embedding("Your massive context here...")
Wa, Wb = mesa.hypernet(doc_emb)

# Inject into base LLM
mesa.inject(Wa, Wb, scaling_factor=2.0)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for ElvianElvy/MESA-Qwen2.5-0.5B-KL-Regularized

Adapter
(504)
this model