metadata
library_name: transformers
pipeline_tag: text-generation
base_model: Qwen/Qwen3-4B
tags:
- interpretability
- massive-activations
- me-layer
Description
WeMask is the implementation of the research paper "A Single Layer to Explain Them All: Understanding Massive Activations in Large Language Models".
The research investigates the origins of "massive activations" in Large Language Models (LLMs) and identifies a specific Massive Emergence Layer (ME Layer) where these activations first appear. This checkpoint is a fine-tuned version of Qwen-3-4B (specifically Qwen-3-VL-4B) using SFT and Reinforcement Learning (RL) to improve model performance and mitigate attention sinks by reducing the rigidity of massive activation tokens.
Resources
- Paper: ArXiv:2605.08504
- Repository: GitHub - ME_Layer
- Project Page: ME-Layer Homepage
Start
You can follow the guidelines in the official repository to start testing and training.
Citation
If you find this research helpful, please cite:
@misc{shi2026singlelayerexplainallunderstanding,
title={A Single Layer to Explain Them All: Understanding Massive Activations in Large Language Models},
author={Zeru Shi and Zhenting Wang and Fan Yang and Qifan Wang and Ruixiang Tang},
year={2026},
eprint={2605.08504},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2605.08504},
}