File size: 2,319 Bytes
a24e534 414f3c5 a24e534 414f3c5 ca84e9f 414f3c5 a24e534 414f3c5 a24e534 414f3c5 a24e534 414f3c5 a24e534 414f3c5 a24e534 414f3c5 a24e534 414f3c5 a24e534 414f3c5 a24e534 414f3c5 a24e534 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 | ---
library_name: transformers
pipeline_tag: text-generation
base_model: Qwen/Qwen3-4B
tags:
- interpretability
- massive-activations
- me-layer
---
<div align="center">
<h1 style="font-size: 32px; font-weight: bold;"> A Single Layer to Explain Them All: Understanding Massive Values in Large Language Models </h1>
<br>
<a href="https://arxiv.org/abs/2605.08504">
<img src="https://img.shields.io/badge/ArXiv-2605.08504-brown?logo=arxiv" alt="Paper">
</a>
<a href="https://huggingface.co/DarkBluee/WeMask">
<img src="https://img.shields.io/badge/🤗 huggingface-Model-purple" alt="checkpoint">
</a>
<a href="https://vanpe20.github.io/ME-Layer.github.io/">
<img src="https://img.shields.io/badge/-HomePage-black?logo=github" alt="checkpoint">
</a>
</div>
## Description
**WeMask** is the implementation of the research paper "[A Single Layer to Explain Them All: Understanding Massive Activations in Large Language Models](https://huggingface.co/papers/2605.08504)".
The research investigates the origins of "massive activations" in Large Language Models (LLMs) and identifies a specific **Massive Emergence Layer (ME Layer)** where these activations first appear. This checkpoint is a fine-tuned version of [Qwen-3-4B](https://huggingface.co/Qwen/Qwen3-4B) (specifically Qwen-3-VL-4B) using SFT and Reinforcement Learning (RL) to improve model performance and mitigate attention sinks by reducing the rigidity of massive activation tokens.
## Resources
- **Paper:** [ArXiv:2605.08504](https://arxiv.org/abs/2605.08504)
- **Repository:** [GitHub - ME_Layer](https://github.com/vanpe20/ME_Layer)
- **Project Page:** [ME-Layer Homepage](https://vanpe20.github.io/ME-Layer.github.io/)
## Start
You can follow the guidelines in the [official repository](https://github.com/vanpe20/ME_Layer) to start testing and training.
## Citation
If you find this research helpful, please cite:
```bibtex
@misc{shi2026singlelayerexplainallunderstanding,
title={A Single Layer to Explain Them All: Understanding Massive Activations in Large Language Models},
author={Zeru Shi and Zhenting Wang and Fan Yang and Qifan Wang and Ruixiang Tang},
year={2026},
eprint={2605.08504},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2605.08504},
}
``` |