nielsr HF Staff commited on
Commit
a24e534
·
verified ·
1 Parent(s): 00e71de

Improve model card and add metadata

Browse files

Hi! I'm Niels, part of the community science team at Hugging Face. I noticed this model card could benefit from additional metadata and structured information to make it more discoverable.

This PR adds:
- Metadata for `pipeline_tag` and `library_name` to enable built-in code snippets and better categorization.
- `base_model` information to link it to the Qwen foundation model.
- An improved description of the research ("WeMask") and the discovery of the Massive Emergence Layer (ME Layer).
- A cleaned-up citation section using the official BibTeX from the paper.

Files changed (1) hide show
  1. README.md +35 -16
README.md CHANGED
@@ -1,10 +1,19 @@
 
 
 
 
 
 
 
 
 
 
1
  <div align="center">
2
  <h1 style="font-size: 32px; font-weight: bold;"> A Single Layer to Explain Them All: Understanding Massive Values in Large Language Models </h1>
3
 
4
  <br>
5
- <br>
6
- <a href="https://arxiv.org/pdf/2605.08504">
7
- <img src="https://img.shields.io/badge/ArXiv-WeMask-brown?logo=arxiv" alt="Paper">
8
  </a>
9
  <a href="https://huggingface.co/DarkBluee/WeMask">
10
  <img src="https://img.shields.io/badge/🤗 huggingface-Model-purple" alt="checkpoint">
@@ -13,25 +22,35 @@
13
  <img src="https://img.shields.io/badge/-HomePage-black?logo=github" alt="checkpoint">
14
  </a>
15
  </div>
16
- </div>
17
 
18
- ## Start
19
 
20
- This page is the model of WeMask, the github link is [ME-Layer](https://github.com/vanpe20/ME_Layer). We use [Qwen-3-VL-4B](https://huggingface.co/Qwen/Qwen3-4B) as our foundation model for SFT and RL training. You can follow the guideline in this repo to start testing and training.
21
 
 
22
 
23
- ## Citation
24
 
25
- If you think our research is helpful, please cite with
 
 
26
 
27
- ```bibtex
28
- @article{me_layer_2026,
29
- title={A Single Layer to Explain Them All: Understanding Massive Values in Large Language Models},
30
- author={Your Name and Co-authors},
31
- journal={Proceedings of the 43rd International Conference on Machine Learning (ICML)},
32
- year={2026}
33
- }
34
- ```
35
 
 
36
 
 
37
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ pipeline_tag: text-generation
4
+ base_model: Qwen/Qwen3-4B
5
+ tags:
6
+ - interpretability
7
+ - massive-activations
8
+ - me-layer
9
+ ---
10
+
11
  <div align="center">
12
  <h1 style="font-size: 32px; font-weight: bold;"> A Single Layer to Explain Them All: Understanding Massive Values in Large Language Models </h1>
13
 
14
  <br>
15
+ <a href="https://arxiv.org/abs/2605.08504">
16
+ <img src="https://img.shields.io/badge/ArXiv-2605.08504-brown?logo=arxiv" alt="Paper">
 
17
  </a>
18
  <a href="https://huggingface.co/DarkBluee/WeMask">
19
  <img src="https://img.shields.io/badge/🤗 huggingface-Model-purple" alt="checkpoint">
 
22
  <img src="https://img.shields.io/badge/-HomePage-black?logo=github" alt="checkpoint">
23
  </a>
24
  </div>
 
25
 
26
+ ## Description
27
 
28
+ **WeMask** is the implementation of the research paper "[A Single Layer to Explain Them All: Understanding Massive Activations in Large Language Models](https://huggingface.co/papers/2605.08504)".
29
 
30
+ The research investigates the origins of "massive activations" in Large Language Models (LLMs) and identifies a specific **Massive Emergence Layer (ME Layer)** where these activations first appear. This checkpoint is a fine-tuned version of [Qwen-3-4B](https://huggingface.co/Qwen/Qwen3-4B) (specifically Qwen-3-VL-4B) using SFT and Reinforcement Learning (RL) to improve model performance and mitigate attention sinks by reducing the rigidity of massive activation tokens.
31
 
32
+ ## Resources
33
 
34
+ - **Paper:** [ArXiv:2605.08504](https://arxiv.org/abs/2605.08504)
35
+ - **Repository:** [GitHub - ME_Layer](https://github.com/vanpe20/ME_Layer)
36
+ - **Project Page:** [ME-Layer Homepage](https://vanpe20.github.io/ME-Layer.github.io/)
37
 
38
+ ## Start
39
+
40
+ You can follow the guidelines in the [official repository](https://github.com/vanpe20/ME_Layer) to start testing and training.
 
 
 
 
 
41
 
42
+ ## Citation
43
 
44
+ If you find this research helpful, please cite:
45
 
46
+ ```bibtex
47
+ @misc{shi2026singlelayerexplainallunderstanding,
48
+ title={A Single Layer to Explain Them All: Understanding Massive Activations in Large Language Models},
49
+ author={Zeru Shi and Zhenting Wang and Fan Yang and Qifan Wang and Ruixiang Tang},
50
+ year={2026},
51
+ eprint={2605.08504},
52
+ archivePrefix={arXiv},
53
+ primaryClass={cs.CL},
54
+ url={https://arxiv.org/abs/2605.08504},
55
+ }
56
+ ```