WeMask / README.md

nielsr HF Staff

Improve model card and add metadata

a24e534 verified about 18 hours ago

2.32 kB

library_name: transformers
pipeline_tag: text-generation
base_model: Qwen/Qwen3-4B
tags:
  - interpretability
  - massive-activations
  - me-layer

A Single Layer to Explain Them All: Understanding Massive Values in Large Language Models

Description

WeMask is the implementation of the research paper "A Single Layer to Explain Them All: Understanding Massive Activations in Large Language Models".

The research investigates the origins of "massive activations" in Large Language Models (LLMs) and identifies a specific Massive Emergence Layer (ME Layer) where these activations first appear. This checkpoint is a fine-tuned version of Qwen-3-4B (specifically Qwen-3-VL-4B) using SFT and Reinforcement Learning (RL) to improve model performance and mitigate attention sinks by reducing the rigidity of massive activation tokens.

Resources

Paper: ArXiv:2605.08504
Repository: GitHub - ME_Layer
Project Page: ME-Layer Homepage

Start

You can follow the guidelines in the official repository to start testing and training.

Citation

If you find this research helpful, please cite:

@misc{shi2026singlelayerexplainallunderstanding,
    title={A Single Layer to Explain Them All: Understanding Massive Activations in Large Language Models}, 
    author={Zeru Shi and Zhenting Wang and Fan Yang and Qifan Wang and Ruixiang Tang},
    year={2026},
    eprint={2605.08504},
    archivePrefix={arXiv},
    primaryClass={cs.CL},
    url={https://arxiv.org/abs/2605.08504}, 
}