BAH Dataset for Ambivalence/Hesitancy Recognition in Videos for Digital Behavioural Change (ICLR2026)

by Manuela González-González^3,4, Soufiane Belharbi¹, Muhammad Osama Zeeshan¹, Masoumeh Sharafi¹, Muhammad Haseeb Aslam¹, Alessandro Lameiras Koerich², Marco Pedersoli¹, Simon L. Bacon^3,4, Eric Granger¹

¹ LIVIA, Dept. of Systems Engineering, ETS Montreal, Canada
² LIVIA, Dept. of Software and IT Engineering, ETS Montreal, Canada
³ Dept. of Health, Kinesiology, & Applied Physiology, Concordia University, Montreal, Canada
⁴ Montreal Behavioural Medicine Centre, CIUSSS Nord-de-l’Ile-de-Montréal, Canada

Contact: livia-datasets

outline

Abstract

Ambivalence and hesitancy (A/H), closely related constructs, are the primary reasons why individuals delay, avoid, or abandon health behaviour changes. They are subtle and conflicting emotions that sets a person in a state between positive and negative orientations, or between acceptance and refusal to do something. They manifest as a discord in affect between multiple modalities or within a modality, such as facial and vocal expressions, and body language. Although experts can be trained to recognize A/H as done for in-person interactions, integrating them into digital health interventions is costly and less effective. Automatic A/H recognition is therefore critical for the personalization and cost-effectiveness of digital behaviour change interventions. However, no datasets currently exist for the design of machine learning models to recognize A/H. This paper introduces the Behavioural Ambivalence/Hesitancy (BAH) dataset collected for multimodal recognition of A/H in videos. It contains 1,427 videos with a total duration of 10.60 hours, captured from 300 participants across Canada, answering predefined questions to elicit A/H.
It is intended to mirror real-world digital behaviour change interventions delivered online. BAH is annotated by three experts to provide timestamps that indicate where A/H occurs, and frame- and video-level annotations with A/H cues. Video transcripts, cropped and aligned faces, and participant metadata are also provided. Since A and H manifest similarly in practice, we provide a binary annotation indicating the presence or absence of A/H. Additionally, this paper includes benchmarking results using baseline models on BAH for frame- and video-level recognition, zero-shot prediction, and personalization with source-free domain adaptation methods. The limited performance highlights the need for adapted multimodal and spatio-temporal models for A/H recognition. Results obtained with specialized fusion methods are shown to assess the presence of conflicts between modalities, additionally temporal modelling for within-modality conflicts are essential for more discriminant A/H recognition. The data, code, and pretrained weights are publicly available: github.com/sbelharbi/bah-dataset.

Code: Pytorch 2.2.2

Citation:

@inproceedings{gonzalez-26-bah,
  title={{BAH} Dataset for Ambivalence/Hesitancy Recognition in Videos for Digital Behavioural Change},
  author={González-González, M. and Belharbi, S. and Zeeshan, M. O. and
    Sharafi, M. and Aslam, M. H and Pedersoli, M. and Koerich, A. L. and
    Bacon, S. L. and Granger, E.},
  booktitle={ICLR},
  year={2026}
}

BAH dataset: Download

To download BAH dataset, please follow closely the instructions described here: BAH Download instructions.

Pretrained weights

The folder pretrained-models contains the weights of several pretrained models:

Frame-level supervised learning: frame-level-supervised-learning contains facial expression vision models, BAH_DB vision models, and multimodal models with different fusion techniques.
Domain adaptation: comming soon.

BAH presentation

BAH: Capture & Annotation

Data capture

7 questions

BAH: Variability

Nutrition label

Dataset vairability

BAH: Experimental Protocol

Dataset: splits

Dataset imbalance

Experiments: Baselines

1) Frame-level supervised classification using multimodal

Dataset: multimodal

Dataset: Frame - multimodal

Dataset: Frame - fusion

2) Video-level supervised classification using multimodal

Dataset: Video - performance

3) Zero-shot performance: Frame- & video-level

Dataset: Zero shot - frame - performance

Dataset: Zero shot - video - performance

4) Personalization using domain adaptation (frame-level)

Dataset: Personalization- domain adaptation - performance

Conclusion

This paper introduces BAH, a new multimodal and participant-based dataset for A/H recognition in videos. It contains the videos of 300 recruited participants captured across 9 provinces in Canada. Participants recorded themselves using a webcam and a microphone through our web-platform while they answered 7 questions designed to elicit A/H.

The dataset amounts to 1,427 videos for a total duration of 10.60 hours, with 1.79 hours of A/H. It was annotated by our behavioural science team at the video- and frame-level. Our initial benchmarking study yields limited performance, highlighting the difficulty of A/H recognition. Results also indicate that leveraging context, multimodality, domain adaptation and adaptive feature fusion are promising directions to improve the accuracy and robustness of ML models on BAH. Our dataset and code are made public.

The appendix contains related work, more detailed and relevant statistics about the datasets and its diversity, dataset limitations, implementation details, and additional results.

Acknowledgments

This work was supported in part by the Fonds de recherche du Québec – Santé, the Natural Sciences and Engineering Research Council of Canada, Canada Foundation for Innovation, and the Digital Research Alliance of Canada. We thank interns that participated in the dataset annotation: Jessica Almeida (Concordia University, Université du Québec à Montréal), and Laura Lucia Ortiz (MBMC).

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for sbelharbi/bah-dataset

BAH Dataset for Ambivalence/Hesitancy Recognition in Videos for Behavioural Change

Paper • 2505.19328 • Published May 25, 2025 • 1