| --- |
| library_name: transformers |
| pipeline_tag: text-generation |
| base_model: Qwen/Qwen3-4B |
| tags: |
| - interpretability |
| - massive-activations |
| - me-layer |
| --- |
| |
| <div align="center"> |
| <h1 style="font-size: 32px; font-weight: bold;"> A Single Layer to Explain Them All: Understanding Massive Values in Large Language Models </h1> |
|
|
| <br> |
| <a href="https://arxiv.org/abs/2605.08504"> |
| <img src="https://img.shields.io/badge/ArXiv-2605.08504-brown?logo=arxiv" alt="Paper"> |
| </a> |
| <a href="https://huggingface.co/DarkBluee/WeMask"> |
| <img src="https://img.shields.io/badge/🤗 huggingface-Model-purple" alt="checkpoint"> |
| </a> |
| <a href="https://vanpe20.github.io/ME-Layer.github.io/"> |
| <img src="https://img.shields.io/badge/-HomePage-black?logo=github" alt="checkpoint"> |
| </a> |
| </div> |
| |
| ## Description |
|
|
| **WeMask** is the implementation of the research paper "[A Single Layer to Explain Them All: Understanding Massive Activations in Large Language Models](https://huggingface.co/papers/2605.08504)". |
|
|
| The research investigates the origins of "massive activations" in Large Language Models (LLMs) and identifies a specific **Massive Emergence Layer (ME Layer)** where these activations first appear. This checkpoint is a fine-tuned version of [Qwen-3-4B](https://huggingface.co/Qwen/Qwen3-4B) (specifically Qwen-3-VL-4B) using SFT and Reinforcement Learning (RL) to improve model performance and mitigate attention sinks by reducing the rigidity of massive activation tokens. |
|
|
| ## Resources |
|
|
| - **Paper:** [ArXiv:2605.08504](https://arxiv.org/abs/2605.08504) |
| - **Repository:** [GitHub - ME_Layer](https://github.com/vanpe20/ME_Layer) |
| - **Project Page:** [ME-Layer Homepage](https://vanpe20.github.io/ME-Layer.github.io/) |
|
|
| ## Start |
|
|
| You can follow the guidelines in the [official repository](https://github.com/vanpe20/ME_Layer) to start testing and training. |
|
|
| ## Citation |
|
|
| If you find this research helpful, please cite: |
|
|
| ```bibtex |
| @misc{shi2026singlelayerexplainallunderstanding, |
| title={A Single Layer to Explain Them All: Understanding Massive Activations in Large Language Models}, |
| author={Zeru Shi and Zhenting Wang and Fan Yang and Qifan Wang and Ruixiang Tang}, |
| year={2026}, |
| eprint={2605.08504}, |
| archivePrefix={arXiv}, |
| primaryClass={cs.CL}, |
| url={https://arxiv.org/abs/2605.08504}, |
| } |
| ``` |