File size: 2,894 Bytes
c6dfc69 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 | # Installation
The project is based on Python and PyTorch. We usually run experiments with multi-GPU training.
Tested runtime:
- Python `3.12.3`
- PyTorch `2.8.0+cu128`
## π₯ Clone the Git repo
``` shell
$ https://github.com/yyliu01/AuralSAM2
$ cd AuralSAM2
```
## π§© Install dependencies
1) create conda env from yaml
```shell
$ conda env create -f docs/auralsam2.yml
```
2) activate env
```shell
$ conda activate auralsam2
```
3) install PyTorch (recommended: match tested runtime)
```shell
# CUDA 12.8 (tested):
$ pip install torch==2.8.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
```
4) install python packages (if needed)
```shell
$ pip install -r docs/requirements.txt
```
## ποΈ Prepare dataset
### AVSBench (`avs.code`)
1) download and prepare AVSBench under repository root.
2) ensure the dataset root path is:
- `AVSBench/`
- `AVSBench/avss_index/metadata.csv` (and subset folders `v1s/`, `v1m/`, `v2/`)
### Ref-AVS (`ref-avs.code`)
1) download and prepare the Ref-AVS (REFAVS) dataset under repository root.
2) ensure the dataset root path is:
- `REFAVS/`
- `REFAVS/metadata.csv` (splits: `train`, `test_s`, `test_u`, `test_n`)
### Checkpoints (shared)
Prepare under repository root:
- `ckpts/sam_ckpts/sam2_hiera_large.pt`
- `ckpts/vggish-10086976.pth`
## ποΈ Workspace structure
```shell
AuralSAM2/
βββ avs.code/
β βββ v1s.code/
β βββ v1m.code/
β βββ v2.code/
βββ ref-avs.code/
βββ scripts/
β βββ run_avs_train.sh
β βββ run_ref_train.sh
βββ AVSBench/
β βββ avss_index
β β βββ metadata.csv
β β βββ metadata_v1m_man.csv
β β βββ metadata_v2_man.csv
β βββ v1m
β β βββ 01uIJMwnUvA_0
β β βββ 0WxgIKuetYI_0
β β ... (419 more)
β βββ v1s
β β βββ --FenyW2i_4_5000_10000
β β βββ --ZHUMfueO0_5000_10000
β β ... (4927 more)
β βββ v2
β βββ --KCIeTv6PM_14000_24000
β βββ --iSerV5DbY_68000_78000
β ... (5995 more)
βββ REFAVS/
β βββ gt_mask
β β βββ --KCIeTv6PM_14000_24000
β β βββ --iSerV5DbY_68000_78000
β β ... (~4000 more)
β βββ media
β β βββ --KCIeTv6PM_14000_24000
β β βββ --iSerV5DbY_68000_78000
β β ... (~4300 more)
β βββ metadata.csv
βββ ckpts/
β βββ sam_ckpts/
β β βββ sam2_hiera_large.pt
β βββ vggish-10086976.pth
βββ docs/
βββ installation.md
βββ before_start.md
βββ requirements.txt
βββ auralsam2.yml
```
## π Notes
- use `docs/before_start.md` for training and inference commands.
- if wandb is not needed, disable online logging in your config.
|