WeMask / README.md
nielsr's picture
nielsr HF Staff
Improve model card and add metadata
a24e534 verified
|
raw
history blame
2.32 kB
metadata
library_name: transformers
pipeline_tag: text-generation
base_model: Qwen/Qwen3-4B
tags:
  - interpretability
  - massive-activations
  - me-layer

A Single Layer to Explain Them All: Understanding Massive Values in Large Language Models


Paper checkpoint checkpoint

Description

WeMask is the implementation of the research paper "A Single Layer to Explain Them All: Understanding Massive Activations in Large Language Models".

The research investigates the origins of "massive activations" in Large Language Models (LLMs) and identifies a specific Massive Emergence Layer (ME Layer) where these activations first appear. This checkpoint is a fine-tuned version of Qwen-3-4B (specifically Qwen-3-VL-4B) using SFT and Reinforcement Learning (RL) to improve model performance and mitigate attention sinks by reducing the rigidity of massive activation tokens.

Resources

Start

You can follow the guidelines in the official repository to start testing and training.

Citation

If you find this research helpful, please cite:

@misc{shi2026singlelayerexplainallunderstanding,
    title={A Single Layer to Explain Them All: Understanding Massive Activations in Large Language Models}, 
    author={Zeru Shi and Zhenting Wang and Fan Yang and Qifan Wang and Ruixiang Tang},
    year={2026},
    eprint={2605.08504},
    archivePrefix={arXiv},
    primaryClass={cs.CL},
    url={https://arxiv.org/abs/2605.08504}, 
}