--- library_name: transformers pipeline_tag: text-generation base_model: Qwen/Qwen3-4B tags: - interpretability - massive-activations - me-layer ---

A Single Layer to Explain Them All: Understanding Massive Values in Large Language Models

## Description **WeMask** is the implementation of the research paper "[A Single Layer to Explain Them All: Understanding Massive Activations in Large Language Models](https://huggingface.co/papers/2605.08504)". The research investigates the origins of "massive activations" in Large Language Models (LLMs) and identifies a specific **Massive Emergence Layer (ME Layer)** where these activations first appear. This checkpoint is a fine-tuned version of [Qwen-3-4B](https://huggingface.co/Qwen/Qwen3-4B) (specifically Qwen-3-VL-4B) using SFT and Reinforcement Learning (RL) to improve model performance and mitigate attention sinks by reducing the rigidity of massive activation tokens. ## Resources - **Paper:** [ArXiv:2605.08504](https://arxiv.org/abs/2605.08504) - **Repository:** [GitHub - ME_Layer](https://github.com/vanpe20/ME_Layer) - **Project Page:** [ME-Layer Homepage](https://vanpe20.github.io/ME-Layer.github.io/) ## Start You can follow the guidelines in the [official repository](https://github.com/vanpe20/ME_Layer) to start testing and training. ## Citation If you find this research helpful, please cite: ```bibtex @misc{shi2026singlelayerexplainallunderstanding, title={A Single Layer to Explain Them All: Understanding Massive Activations in Large Language Models}, author={Zeru Shi and Zhenting Wang and Fan Yang and Qifan Wang and Ruixiang Tang}, year={2026}, eprint={2605.08504}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2605.08504}, } ```