--- license: apache-2.0 tags: - pytorch ---

Gaming for Boundary: Elastic Localization for Frame-Supervised Video Moment Retrieval

Hao Liu1  Yupeng Hu1✉  Kun Wang1  Yinwei Wei1  Liqiang Nie2

1School of Software, Shandong University, Jinan, China
2School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, China

This is the official PyTorch implementation of **GOAL**, a frame-supervised Video Moment Retrieval (VMR) framework for elastic boundary localization via a game-based paradigm and Dynamic Updating Technique (DUT). 🔗 **Paper:** [SIGIR 2025](https://doi.org/10.1145/3726302.3729984) 🔗 **GitHub Repository:** [iLearn-Lab/SIGIR25-GOAL](https://github.com/iLearn-Lab/SIGIR25-GOAL) --- ## Model Information ### 1. Model Name **GOAL** (**G**aming f**O**r el**A**stic **L**ocalization). ### 2. Task Type & Applicable Tasks - **Task Type:** Frame-Supervised Video Moment Retrieval (VMR) / Temporal Localization / Vision-Language Learning - **Applicable Tasks:** Retrieving the temporal moment in a video that matches a natural language query using a single annotated frame, with a focus on ambiguous temporal boundary localization. ### 3. Project Introduction Frame-supervised Video Moment Retrieval (VMR) aims to retrieve the temporal moment in a video that matches a natural language query using only a single annotated frame. While this setting reduces annotation cost, it brings severe ambiguity in temporal boundary prediction. **GOAL** addresses this challenge through a **game-based paradigm** with three players, namely **KFP**, **AFP**, and **BP**, together with a **Dynamic Updating Technique (DUT)** that progressively refines boundary decisions through unilateral and bilateral updates for more elastic localization. ### 4. Training Data Source The model is trained and evaluated on standard frame-supervised VMR benchmarks: - **ActivityNet Captions** - **Charades-STA** - **TACoS** --- ## Usage & Basic Inference This codebase provides training and evaluation scripts for frame-supervised VMR, as well as checkpoints for quick reproduction. ### Step 1: Prepare the Environment Clone the GitHub repository and install dependencies: ```bash git clone https://github.com/iLearn-Lab/SIGIR25-GOAL.git cd GOAL python -m venv .venv source .venv/bin/activate # Linux / Mac # .venv\Scripts\activate # Windows pip install numpy scipy pyyaml tqdm ``` ### Step 2: Download Model Weights & Data Prepare features and raw annotations following [ViGA](https://github.com/r-cui/ViGA)'s dataset preparation protocol. Before running the code, please check and replace local dataset and feature paths in: - `src/config.yaml` - `src/utils/utils.py` ### Step 3: Run Inference To evaluate a trained experiment folder, run: ```bash python -m src.experiment.eval --exp path/to/your/experiment_folder ``` --- ## Limitations & Notes **Disclaimer:** This repository is intended for **academic research purposes only**. - The model requires access to the original benchmark datasets and extracted video features for evaluation. - Some configuration files currently contain local path settings and should be updated before use. --- ## Citation If you find our work useful in your research, please consider citing our paper: ```bibtex @inproceedings{liu2025gaming, title={Gaming for Boundary: Elastic Localization for Frame-Supervised Video Moment Retrieval}, author={Liu, Hao and Hu, Yupeng and Wang, Kun and Wei, Yinwei and Nie, Liqiang}, booktitle={Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval}, year={2025}, doi={10.1145/3726302.3729984} } ``` --- ## Contact **If you have any questions, feel free to contact me at liuh90210@gmail.com**.