File size: 2,319 Bytes

a24e534
 
 
 
 
 
 
 
 
 
414f3c5
 
 
 
a24e534
 
414f3c5
 
 
 
ca84e9f
414f3c5
 
 
 
a24e534
414f3c5
a24e534
414f3c5
a24e534
414f3c5
a24e534
414f3c5
a24e534
 
 
414f3c5
a24e534
 
 
414f3c5
a24e534
414f3c5
a24e534
414f3c5
a24e534

---
library_name: transformers
pipeline_tag: text-generation
base_model: Qwen/Qwen3-4B
tags:
- interpretability
- massive-activations
- me-layer
---

<div align="center">
  <h1 style="font-size: 32px; font-weight: bold;"> A Single Layer to Explain Them All: Understanding Massive Values in Large Language Models </h1>

  <br>
  <a href="https://arxiv.org/abs/2605.08504">
    <img src="https://img.shields.io/badge/ArXiv-2605.08504-brown?logo=arxiv" alt="Paper">
  </a>
  <a href="https://huggingface.co/DarkBluee/WeMask">
    <img src="https://img.shields.io/badge/🤗 huggingface-Model-purple" alt="checkpoint">
  </a>
  <a href="https://vanpe20.github.io/ME-Layer.github.io/">
    <img src="https://img.shields.io/badge/-HomePage-black?logo=github" alt="checkpoint">
  </a>
</div>

## Description

**WeMask** is the implementation of the research paper "[A Single Layer to Explain Them All: Understanding Massive Activations in Large Language Models](https://huggingface.co/papers/2605.08504)". 

The research investigates the origins of "massive activations" in Large Language Models (LLMs) and identifies a specific **Massive Emergence Layer (ME Layer)** where these activations first appear. This checkpoint is a fine-tuned version of [Qwen-3-4B](https://huggingface.co/Qwen/Qwen3-4B) (specifically Qwen-3-VL-4B) using SFT and Reinforcement Learning (RL) to improve model performance and mitigate attention sinks by reducing the rigidity of massive activation tokens.

## Resources

- **Paper:** [ArXiv:2605.08504](https://arxiv.org/abs/2605.08504)
- **Repository:** [GitHub - ME_Layer](https://github.com/vanpe20/ME_Layer)
- **Project Page:** [ME-Layer Homepage](https://vanpe20.github.io/ME-Layer.github.io/)

## Start

You can follow the guidelines in the [official repository](https://github.com/vanpe20/ME_Layer) to start testing and training.

## Citation

If you find this research helpful, please cite:

```bibtex
@misc{shi2026singlelayerexplainallunderstanding,
    title={A Single Layer to Explain Them All: Understanding Massive Activations in Large Language Models}, 
    author={Zeru Shi and Zhenting Wang and Fan Yang and Qifan Wang and Ruixiang Tang},
    year={2026},
    eprint={2605.08504},
    archivePrefix={arXiv},
    primaryClass={cs.CL},
    url={https://arxiv.org/abs/2605.08504}, 
}
```