AuralSAM2 / README.md
nielsr's picture
nielsr HF Staff
Improve model card and add metadata
9e5f1bf verified
|
raw
history blame
1.5 kB
---
pipeline_tag: image-segmentation
---
# AuralSAM2
This repository contains the weights for **AuralSAM2**, as presented in the paper [AuralSAM2: Enabling SAM2 Hear Through Pyramid Audio-Visual Feature Prompting](https://huggingface.co/papers/2506.01015).
AuralSAM2 integrates audio into the Segment Anything Model 2 (SAM2) while preserving its promptable segmentation capability. It introduces the **AuralFuser** module, which fuses audio and visual features to generate sparse and dense prompts. These prompts propagate auditory cues across SAM2's feature pyramid, enabling audio-guided object segmentation.
[**Paper**](https://huggingface.co/papers/2506.01015) | [**GitHub Code**](https://github.com/yyliu01/AuralSAM2)
<img src="./docs/overview.png" width="850" alt="AuralSAM2 overview" />
## Installation
Please install the dependencies and dataset based on the [***installation***](./docs/installation.md) document in the official repository.
## Getting started
Please follow the [***instruction***](./docs/before_start.md) document to reproduce the results.
## Citation
If you find this work helpful for your research, please consider citing:
```bibtex
@article{liu2025auralsam2,
title={AuralSAM2: Enabling SAM2 Hear Through Pyramid Audio-Visual Feature Prompting},
author={Liu, Yuyuan and Chen, Yuanhong and Wang, Chong and Han, Junlin and Wu, Junde and Peng, Can and Jingkun Chen and Yu Tian and Gustavo Carneiro},
journal={arXiv preprint arXiv:2506.01015},
year={2025}
}
```