yyliu01
/

AuralSAM2

Model card Files Files and versions

AuralSAM2 / README.md

nielsr's picture

nielsr HF Staff

Improve model card and add metadata

9e5f1bf verified 1 day ago

|

1.5 kB

	---
	pipeline_tag: image-segmentation
	---

	# AuralSAM2

	This repository contains the weights for AuralSAM2, as presented in the paper [AuralSAM2: Enabling SAM2 Hear Through Pyramid Audio-Visual Feature Prompting](https://huggingface.co/papers/2506.01015).

	AuralSAM2 integrates audio into the Segment Anything Model 2 (SAM2) while preserving its promptable segmentation capability. It introduces the AuralFuser module, which fuses audio and visual features to generate sparse and dense prompts. These prompts propagate auditory cues across SAM2's feature pyramid, enabling audio-guided object segmentation.

	[Paper](https://huggingface.co/papers/2506.01015) \| [GitHub Code](https://github.com/yyliu01/AuralSAM2)

	<img src="./docs/overview.png" width="850" alt="AuralSAM2 overview" />

	## Installation
	Please install the dependencies and dataset based on the [*installation*](./docs/installation.md) document in the official repository.

	## Getting started
	Please follow the [*instruction*](./docs/before_start.md) document to reproduce the results.

	## Citation
	If you find this work helpful for your research, please consider citing:

	```bibtex
	@article{liu2025auralsam2,
	title={AuralSAM2: Enabling SAM2 Hear Through Pyramid Audio-Visual Feature Prompting},
	author={Liu, Yuyuan and Chen, Yuanhong and Wang, Chong and Han, Junlin and Wu, Junde and Peng, Can and Jingkun Chen and Yu Tian and Gustavo Carneiro},
	journal={arXiv preprint arXiv:2506.01015},
	year={2025}
	}
	```