# FineLAP Training & Fine-tuning Before training, make sure that all files from [here](https://huggingface.co/AndreasXi/FineLAP_Pytorch) have been downloaded to `./weights/`. ## Environmental Setup ```bash conda create -n finelap python=3.9 git clone https://github.com/facebookresearch/fairseq.git pip install "pip<24.1" -U; cd fairseq; pip install -e ./ pip install -r requirements_train.txt ``` ## Data Setup To train FineLAP, we format the data in a JSONL structure as follows: ```json { "audio_id": "Ycq6bqC_AsO4.flac", "audio_path": "path/to/audio.wav", "caption": "Birds are chirping with background noise.", "phrases": [ { "phrase": "Background noise", "segments": [ [0.498, 10.0] ] }, { "phrase": "Bird vocalization, bird call, bird song", "segments": [ [0.629, 4.114], [4.313, 10.0] ] } ] } ``` Each entry contains: - audio_id: Unique identifier of the audio sample. - audio_path: Path to the audio file. - caption: A clip-level description of the audio content. - phrases (optional): A list of sound events, where each includes: - phrase: Textual phrase of the event - segments: Time intervals (in seconds) indicating when the event occurs For data without frame-level annotations, the `phrases` field can be omitted. The dataset will automatically detect this and skip the frame-level loss for such samples. An example training metadata file with 10 samples is provided at `data/training_metadata_example.jsonl`. The current training pipeline uses the phrase bank `data/phrase_bank_new_with_FSDLabel_UrbanSED.jsonl`. Once the dataset metadata JSONL is ready, include it in the `train_data_args.metadata_files` list defined in `config/data_config/data_eat.yaml` or `config/data_config/data_htsat.yaml`. ## Start Training Run ```bash bash scripts/train.sh ``` to start training. This will use the config `config/finelap_eat_config.yaml`. The output will be saved in `exps/${exp_name}`. ## Fine-tuning From a FineLAP Checkpoint The training code now supports loading an existing FineLAP checkpoint before training starts. This is useful when you want to finetune from a previously trained model such as `weights/finelap_fixed.pt`. In `config/finelap_eat_config.yaml`, set: ```yaml model_args: ckpt_path: './weights/finelap_fixed.pt' ``` If `ckpt_path` is an empty string: ```yaml model_args: ckpt_path: '' ``` then no FineLAP checkpoint will be loaded, and training will start from the encoder initialization defined by `audio_encoder_ckpt` and `text_encoder_ckpt`. This finetuning path loads model weights only. It does not restore the optimizer state or resume the previous epoch count.