Upload 7 files

Browse files

Files changed (7) hide show

Cha_ckpt/i3d/config.yaml +45 -0
Cha_ckpt/i3d/model_best.pt +3 -0
Cha_ckpt/vgg/config.yaml +49 -0
Cha_ckpt/vgg/model_best.pt +3 -0
README.md +114 -0
TACoS_ckpt/config.yaml +51 -0
TACoS_ckpt/model_best.pt +3 -0

Cha_ckpt/i3d/config.yaml ADDED Viewed

	@@ -0,0 +1,45 @@

+charadessta:
+  batch_size: 512
+  clip_frames:
+  - 8
+  epoch: 15
+  feature_dim: 1024
+  feature_dir: /data/charadessta/i3d
+  moment_length_factors:
+  - 0.25
+  - 0.3
+  - 0.35
+  overlapping_factors:
+  - 0.0
+  - 0.1
+  - 0.2
+  - 0.3
+  - 0.4
+  - 0.5
+  - 0.6
+  - 0.7
+  - 0.8
+  - 0.9
+  pooling_func: max_pooling
+  sigma_factor: 0.3
+  stride: 4
+  video_feature_len: 128
+  frac: 0.157
+  width: 20
+  alpha: 10
+  beta: 0.002
+dataset_name: charadessta
+exp_dir: log
+gpu: '0'
+model:
+  dim: 512
+  dropout: 0.1
+  glove_path: /data/glove.840B.300d.txt
+  n_layers: 2
+  temp: 0.07
+  topk: 1
+seed: 1
+train:
+  clip_norm: 1.0
+  dev: false
+  init_lr: 0.0001

Cha_ckpt/i3d/model_best.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3d97e2e75be754bbd0480ea11c5a3c01830e33c7fab1ba5cdb4adc89ab11f904
+size 56988191

Cha_ckpt/vgg/config.yaml ADDED Viewed

	@@ -0,0 +1,49 @@

+charadessta:
+  batch_size: 256
+  clip_frames:
+  - 8
+  epoch: 15
+  feature_dim: 4096
+  feature_dir: /data/charadessta/vgg
+  moment_length_factors:
+  - 0.1
+  - 0.15
+  - 0.2
+  - 0.25
+  - 0.3
+  - 0.35
+  - 0.4
+  overlapping_factors:
+  - 0.0
+  - 0.1
+  - 0.2
+  - 0.3
+  - 0.4
+  - 0.5
+  - 0.6
+  - 0.7
+  - 0.8
+  - 0.9
+  pooling_func: max_pooling
+  sigma_factor: 0.3
+  stride: 4
+  video_feature_len: 256
+  frac: 0.115
+  width: 30
+  alpha: 10
+  beta: 0.005
+dataset_name: charadessta
+exp_dir: log
+gpu: '0'
+model:
+  dim: 512
+  dropout: 0.1
+  glove_path: /data/glove.840B.300d.txt
+  n_layers: 2
+  temp: 0.07
+  topk: 1
+seed: 1
+train:
+  clip_norm: 1.0
+  dev: false
+  init_lr: 0.0001

Cha_ckpt/vgg/model_best.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f89802dca0a9935c627a85e35291a7efd41e5107505db9862836f420e5591b5f
+size 63908088

README.md ADDED Viewed

	@@ -0,0 +1,114 @@

+---
+license: apache-2.0
+tags:
+- video-moment-retrieval
+- frame-supervised
+- temporal-localization
+- vision-language
+- multimodal
+- pytorch
+---
+<a id="top"></a>
+<div align="center">
+  <h1>Gaming for Boundary: Elastic Localization for Frame-Supervised Video Moment Retrieval</h1>
+  <p>
+    <b>Hao Liu</b><sup>1</sup>&nbsp;
+    <b>Yupeng Hu</b><sup>1✉</sup>&nbsp;
+    <b>Kun Wang</b><sup>1</sup>&nbsp;
+    <b>Yinwei Wei</b><sup>1</sup>&nbsp;
+    <b>Liqiang Nie</b><sup>2</sup>
+  </p>
+  <p>
+    <sup>1</sup>School of Software, Shandong University, Jinan, China<br>
+    <sup>2</sup>School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, China
+  </p>
+</div>
+This is the official PyTorch implementation of **GOAL**, a frame-supervised Video Moment Retrieval (VMR) framework for elastic boundary localization via a game-based paradigm and Dynamic Updating Technique (DUT).
+🔗 **Paper:** [SIGIR 2025](https://doi.org/10.1145/3726302.3729984)
+🔗 **GitHub Repository:** [iLearn-Lab/SIGIR25-GOAL](https://github.com/iLearn-Lab/SIGIR25-GOAL)
+---
+##  Model Information
+### 1. Model Name
+**GOAL** (**G**aming f**O**r el**A**stic **L**ocalization).
+### 2. Task Type & Applicable Tasks
+- **Task Type:** Frame-Supervised Video Moment Retrieval (VMR) / Temporal Localization / Vision-Language Learning
+- **Applicable Tasks:** Retrieving the temporal moment in a video that matches a natural language query using a single annotated frame, with a focus on ambiguous temporal boundary localization.
+### 3. Project Introduction
+Frame-supervised Video Moment Retrieval (VMR) aims to retrieve the temporal moment in a video that matches a natural language query using only a single annotated frame. While this setting reduces annotation cost, it brings severe ambiguity in temporal boundary prediction.
+**GOAL** addresses this challenge through a **game-based paradigm** with three players, namely **KFP**, **AFP**, and **BP**, together with a **Dynamic Updating Technique (DUT)** that progressively refines boundary decisions through unilateral and bilateral updates for more elastic localization.
+### 4. Training Data Source
+The model is trained and evaluated on standard frame-supervised VMR benchmarks:
+- **ActivityNet Captions**
+- **Charades-STA**
+- **TACoS**
+---
+##  Usage & Basic Inference
+This codebase provides training and evaluation scripts for frame-supervised VMR, as well as checkpoints for quick reproduction.
+### Step 1: Prepare the Environment
+Clone the GitHub repository and install dependencies:
+```bash
+git clone https://github.com/iLearn-Lab/SIGIR25-GOAL.git
+cd GOAL
+python -m venv .venv
+source .venv/bin/activate   # Linux / Mac
+# .venv\Scripts\activate    # Windows
+pip install numpy scipy pyyaml tqdm
+```
+### Step 2: Download Model Weights & Data
+Prepare features and raw annotations following [ViGA](https://github.com/r-cui/ViGA)'s dataset preparation protocol.
+Before running the code, please check and replace local dataset and feature paths in:
+- `src/config.yaml`
+- `src/utils/utils.py`
+### Step 3: Run Inference
+To evaluate a trained experiment folder, run:
+```bash
+python -m src.experiment.eval --exp path/to/your/experiment_folder
+```
+---
+##  Limitations & Notes
+**Disclaimer:** This repository is intended for **academic research purposes only**.
+- The model requires access to the original benchmark datasets and extracted video features for evaluation.
+- Some configuration files currently contain local path settings and should be updated before use.
+---
+## Citation
+If you find our work useful in your research, please consider citing our paper:
+```bibtex
+@inproceedings{liu2025gaming,
+  title={Gaming for Boundary: Elastic Localization for Frame-Supervised Video Moment Retrieval},
+  author={Liu, Hao and Hu, Yupeng and Wang, Kun and Wei, Yinwei and Nie, Liqiang},
+  booktitle={Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval},
+  year={2025},
+  doi={10.1145/3726302.3729984}
+}
+```
+---
+## Contact
+**If you have any questions, feel free to contact me at liuh90210@gmail.com**.

TACoS_ckpt/config.yaml ADDED Viewed

	@@ -0,0 +1,51 @@

+dataset_name: tacos
+exp_dir: log
+gpu: '0'
+model:
+  dim: 512
+  dropout: 0.1
+  glove_path: /data/glove.840B.300d.txt
+  n_layers: 2
+  temp: 0.07
+  topk: 1
+seed: 1
+tacos:
+  batch_size: 128
+  clip_frames:
+  - 32
+  epoch: 30
+  feature_dim: 4096
+  feature_dir: /data/tacos/c3d
+  moment_length_factors:
+  - 0.05
+  - 0.1
+  - 0.15
+  - 0.2
+  - 0.25
+  - 0.3
+  - 0.35
+  - 0.4
+  overlapping_factors:
+  - 0.0
+  - 0.1
+  - 0.2
+  - 0.3
+  - 0.4
+  - 0.5
+  - 0.6
+  - 0.7
+  - 0.8
+  - 0.9
+  pooling_func: max_pooling
+  sigma_factor: 1.0
+  stride: 16
+  video_feature_len: 512
+  frac: 0.016
+  width: 30
+  alpha: 10
+  beta: 0.002
+train:
+  clip_norm: 1.0
+  dev: false
+  init_lr: 0.0001

TACoS_ckpt/model_best.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:21d9db952aa6ef13ec0ca3472a969a482b8055a641ebf12f92c3509eadcddca8
+size 65559288