DarkBluee
/

WeMask

Safetensors

Model card Files Files and versions

xet

Community

Improve model card and add metadata

by nielsr HF Staff - opened about 9 hours ago

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+35

-16

Files changed (1) hide show

README.md +35 -16

README.md CHANGED Viewed

@@ -1,10 +1,19 @@
 <div align="center">
   <h1 style="font-size: 32px; font-weight: bold;"> A Single Layer to Explain Them All: Understanding Massive Values in Large Language Models </h1>
   <br>
-  <br>
-  <a href="https://arxiv.org/pdf/2605.08504">
-    <img src="https://img.shields.io/badge/ArXiv-WeMask-brown?logo=arxiv" alt="Paper">
   </a>
   <a href="https://huggingface.co/DarkBluee/WeMask">
     <img src="https://img.shields.io/badge/🤗 huggingface-Model-purple" alt="checkpoint">
@@ -13,25 +22,35 @@
     <img src="https://img.shields.io/badge/-HomePage-black?logo=github" alt="checkpoint">
   </a>
 </div>
-</div>
-## Start
-This page is the model of WeMask, the github link is [ME-Layer](https://github.com/vanpe20/ME_Layer). We use [Qwen-3-VL-4B](https://huggingface.co/Qwen/Qwen3-4B) as our foundation model for SFT and RL training. You can follow the guideline in this repo to start testing and training.
-## Citation
-If you think our research is helpful, please cite with
-```bibtex
-@article{me_layer_2026,
-  title={A Single Layer to Explain Them All: Understanding Massive Values in Large Language Models},
-  author={Your Name and Co-authors},
-  journal={Proceedings of the 43rd International Conference on Machine Learning (ICML)},
-  year={2026}
-}
-```

+---
+library_name: transformers
+pipeline_tag: text-generation
+base_model: Qwen/Qwen3-4B
+tags:
+- interpretability
+- massive-activations
+- me-layer
+---
 <div align="center">
   <h1 style="font-size: 32px; font-weight: bold;"> A Single Layer to Explain Them All: Understanding Massive Values in Large Language Models </h1>
   <br>
+  <a href="https://arxiv.org/abs/2605.08504">
+    <img src="https://img.shields.io/badge/ArXiv-2605.08504-brown?logo=arxiv" alt="Paper">
   </a>
   <a href="https://huggingface.co/DarkBluee/WeMask">
     <img src="https://img.shields.io/badge/🤗 huggingface-Model-purple" alt="checkpoint">
     <img src="https://img.shields.io/badge/-HomePage-black?logo=github" alt="checkpoint">
   </a>
 </div>
+## Description
+**WeMask** is the implementation of the research paper "[A Single Layer to Explain Them All: Understanding Massive Activations in Large Language Models](https://huggingface.co/papers/2605.08504)".
+The research investigates the origins of "massive activations" in Large Language Models (LLMs) and identifies a specific **Massive Emergence Layer (ME Layer)** where these activations first appear. This checkpoint is a fine-tuned version of [Qwen-3-4B](https://huggingface.co/Qwen/Qwen3-4B) (specifically Qwen-3-VL-4B) using SFT and Reinforcement Learning (RL) to improve model performance and mitigate attention sinks by reducing the rigidity of massive activation tokens.
+## Resources
+- **Paper:** [ArXiv:2605.08504](https://arxiv.org/abs/2605.08504)
+- **Repository:** [GitHub - ME_Layer](https://github.com/vanpe20/ME_Layer)
+- **Project Page:** [ME-Layer Homepage](https://vanpe20.github.io/ME-Layer.github.io/)
+## Start
+You can follow the guidelines in the [official repository](https://github.com/vanpe20/ME_Layer) to start testing and training.
+## Citation
+If you find this research helpful, please cite:
+```bibtex
+@misc{shi2026singlelayerexplainallunderstanding,
+    title={A Single Layer to Explain Them All: Understanding Massive Activations in Large Language Models},
+    author={Zeru Shi and Zhenting Wang and Fan Yang and Qifan Wang and Ruixiang Tang},
+    year={2026},
+    eprint={2605.08504},
+    archivePrefix={arXiv},
+    primaryClass={cs.CL},
+    url={https://arxiv.org/abs/2605.08504},
+}
+```