AuralSAM2 / README.md
nielsr's picture
nielsr HF Staff
Improve model card and add metadata
9e5f1bf verified
|
raw
history blame
1.5 kB
metadata
pipeline_tag: image-segmentation

AuralSAM2

This repository contains the weights for AuralSAM2, as presented in the paper AuralSAM2: Enabling SAM2 Hear Through Pyramid Audio-Visual Feature Prompting.

AuralSAM2 integrates audio into the Segment Anything Model 2 (SAM2) while preserving its promptable segmentation capability. It introduces the AuralFuser module, which fuses audio and visual features to generate sparse and dense prompts. These prompts propagate auditory cues across SAM2's feature pyramid, enabling audio-guided object segmentation.

Paper | GitHub Code

AuralSAM2 overview

Installation

Please install the dependencies and dataset based on the installation document in the official repository.

Getting started

Please follow the instruction document to reproduce the results.

Citation

If you find this work helpful for your research, please consider citing:

@article{liu2025auralsam2,
  title={AuralSAM2: Enabling SAM2 Hear Through Pyramid Audio-Visual Feature Prompting},
  author={Liu, Yuyuan and Chen, Yuanhong and Wang, Chong and Han, Junlin and Wu, Junde and Peng, Can and Jingkun Chen and Yu Tian and Gustavo Carneiro},
  journal={arXiv preprint arXiv:2506.01015},
  year={2025}
}