# Installation

The project is based on Python and PyTorch. We usually run experiments with multi-GPU training.

Tested runtime:
- Python `3.12.3`
- PyTorch `2.8.0+cu128`

## 📥 Clone the Git repo

``` shell
$ https://github.com/yyliu01/AuralSAM2
$ cd AuralSAM2
```

## 🧩 Install dependencies

1) create conda env from yaml
```shell
$ conda env create -f docs/auralsam2.yml
```

2) activate env
```shell
$ conda activate auralsam2
```

3) install PyTorch (recommended: match tested runtime)
```shell
# CUDA 12.8 (tested):
$ pip install torch==2.8.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
```

4) install python packages (if needed)
```shell
$ pip install -r docs/requirements.txt
```

## 🗂️ Prepare dataset

### AVSBench (`avs.code`)

1) download and prepare AVSBench under repository root.
2) ensure the dataset root path is:
   - `AVSBench/`
   - `AVSBench/avss_index/metadata.csv` (and subset folders `v1s/`, `v1m/`, `v2/`)

### Ref-AVS (`ref-avs.code`)

1) download and prepare the Ref-AVS (REFAVS) dataset under repository root.
2) ensure the dataset root path is:
   - `REFAVS/`
   - `REFAVS/metadata.csv` (splits: `train`, `test_s`, `test_u`, `test_n`)


### Checkpoints (shared)

Prepare under repository root:

- `ckpts/sam_ckpts/sam2_hiera_large.pt`
- `ckpts/vggish-10086976.pth`

## 🏗️ Workspace structure

```shell
AuralSAM2/
├── avs.code/
│   ├── v1s.code/
│   ├── v1m.code/
│   └── v2.code/
├── ref-avs.code/
├── scripts/
│   ├── run_avs_train.sh
│   └── run_ref_train.sh
├── AVSBench/
│   ├── avss_index
│   │   ├── metadata.csv
│   │   ├── metadata_v1m_man.csv
│   │   └── metadata_v2_man.csv
│   ├── v1m
│   │   ├── 01uIJMwnUvA_0
│   │   ├── 0WxgIKuetYI_0
│   │   ... (419 more)
│   ├── v1s
│   │   ├── --FenyW2i_4_5000_10000
│   │   ├── --ZHUMfueO0_5000_10000
│   │   ... (4927 more)
│   └── v2
│       ├── --KCIeTv6PM_14000_24000
│       ├── --iSerV5DbY_68000_78000
│       ... (5995 more)
├── REFAVS/
│   ├── gt_mask
│   │   ├── --KCIeTv6PM_14000_24000
│   │   ├── --iSerV5DbY_68000_78000
│   │   ... (~4000 more)
│   ├── media
│   │   ├── --KCIeTv6PM_14000_24000
│   │   ├── --iSerV5DbY_68000_78000
│   │   ... (~4300 more)
│   └── metadata.csv
├── ckpts/
│   ├── sam_ckpts/
│   │   └── sam2_hiera_large.pt
│   └── vggish-10086976.pth
└── docs/
    ├── installation.md
    ├── before_start.md
    ├── requirements.txt
    └── auralsam2.yml
```

## 📝 Notes

- use `docs/before_start.md` for training and inference commands.
- if wandb is not needed, disable online logging in your config.