new

Get trending papers in your email inbox!

Subscribe

Daily Papers

byAK and the research community

Apr 23

Drive Video Analysis for the Detection of Traffic Near-Miss Incidents

Because of their recent introduction, self-driving cars and advanced driver assistance system (ADAS) equipped vehicles have had little opportunity to learn, the dangerous traffic (including near-miss incident) scenarios that provide normal drivers with strong motivation to drive safely. Accordingly, as a means of providing learning depth, this paper presents a novel traffic database that contains information on a large number of traffic near-miss incidents that were obtained by mounting driving recorders in more than 100 taxis over the course of a decade. The study makes the following two main contributions: (i) In order to assist automated systems in detecting near-miss incidents based on database instances, we created a large-scale traffic near-miss incident database (NIDB) that consists of video clip of dangerous events captured by monocular driving recorders. (ii) To illustrate the applicability of NIDB traffic near-miss incidents, we provide two primary database-related improvements: parameter fine-tuning using various near-miss scenes from NIDB, and foreground/background separation into motion representation. Then, using our new database in conjunction with a monocular driving recorder, we developed a near-miss recognition method that provides automated systems with a performance level that is comparable to a human-level understanding of near-miss incidents (64.5% vs. 68.4% at near-miss recognition, 61.3% vs. 78.7% at near-miss detection).

  • 5 authors
·
Apr 7, 2018

Simulating the Unseen: Crash Prediction Must Learn from What Did Not Happen

Traffic safety science has long been hindered by a fundamental data paradox: the crashes we most wish to prevent are precisely those events we rarely observe. Existing crash-frequency models and surrogate safety metrics rely heavily on sparse, noisy, and under-reported records, while even sophisticated, high-fidelity simulations undersample the long-tailed situations that trigger catastrophic outcomes such as fatalities. We argue that the path to achieving Vision Zero, i.e., the complete elimination of traffic fatalities and severe injuries, requires a paradigm shift from traditional crash-only learning to a new form of counterfactual safety learning: reasoning not only about what happened, but also about the vast set of plausible yet perilous scenarios that could have happened under slightly different circumstances. To operationalize this shift, our proposed agenda bridges macro to micro. Guided by crash-rate priors, generative scene engines, diverse driver models, and causal learning, near-miss events are synthesized and explained. A crash-focused digital twin testbed links micro scenes to macro patterns, while a multi-objective validator ensures that simulations maintain statistical realism. This pipeline transforms sparse crash data into rich signals for crash prediction, enabling the stress-testing of vehicles, roads, and policies before deployment. By learning from crashes that almost happened, we can shift traffic safety from reactive forensics to proactive prevention, advancing Vision Zero.

  • 15 authors
·
May 27, 2025

Advance Real-time Detection of Traffic Incidents in Highways using Vehicle Trajectory Data

A significant number of traffic crashes are secondary crashes that occur because of an earlier incident on the road. Thus, early detection of traffic incidents is crucial for road users from safety perspectives with a potential to reduce the risk of secondary crashes. The wide availability of GPS devices now-a-days gives an opportunity of tracking and recording vehicle trajectories. The objective of this study is to use vehicle trajectory data for advance real-time detection of traffic incidents on highways using machine learning-based algorithms. The study uses three days of unevenly sequenced vehicle trajectory data and traffic incident data on I-10, one of the most crash-prone highways in Louisiana. Vehicle trajectories are converted to trajectories based on virtual detector locations to maintain spatial uniformity as well as to generate historical traffic data for machine learning algorithms. Trips matched with traffic incidents on the way are separated and along with other trips with similar spatial attributes are used to build a database for modeling. Multiple machine learning algorithms such as Logistic Regression, Random Forest, Extreme Gradient Boost, and Artificial Neural Network models are used to detect a trajectory that is likely to face an incident in the downstream road section. Results suggest that the Random Forest model achieves the best performance for predicting an incident with reasonable recall value and discrimination capability.

  • 2 authors
·
Aug 14, 2024

Accident Risk Prediction based on Heterogeneous Sparse Data: New Dataset and Insights

Reducing traffic accidents is an important public safety challenge, therefore, accident analysis and prediction has been a topic of much research over the past few decades. Using small-scale datasets with limited coverage, being dependent on extensive set of data, and being not applicable for real-time purposes are the important shortcomings of the existing studies. To address these challenges, we propose a new solution for real-time traffic accident prediction using easy-to-obtain, but sparse data. Our solution relies on a deep-neural-network model (which we have named DAP, for Deep Accident Prediction); which utilizes a variety of data attributes such as traffic events, weather data, points-of-interest, and time. DAP incorporates multiple components including a recurrent (for time-sensitive data), a fully connected (for time-insensitive data), and a trainable embedding component (to capture spatial heterogeneity). To fill the data gap, we have - through a comprehensive process of data collection, integration, and augmentation - created a large-scale publicly available database of accident information named US-Accidents. By employing the US-Accidents dataset and through an extensive set of experiments across several large cities, we have evaluated our proposal against several baselines. Our analysis and results show significant improvements to predict rare accident events. Further, we have shown the impact of traffic information, time, and points-of-interest data for real-time accident prediction.

  • 5 authors
·
Sep 19, 2019

AccidentBench: Benchmarking Multimodal Understanding and Reasoning in Vehicle Accidents and Beyond

Rapid advances in multimodal models demand benchmarks that rigorously evaluate understanding and reasoning in safety-critical, dynamic real-world settings. We present AccidentBench, a large-scale benchmark that combines vehicle accident scenarios with Beyond domains, safety-critical settings in air and water that emphasize spatial and temporal reasoning (e.g., navigation, orientation, multi-vehicle motion). The benchmark contains approximately 2000 videos and over 19000 human-annotated question--answer pairs spanning multiple video lengths (short/medium/long) and difficulty levels (easy/medium/hard). Tasks systematically probe core capabilities: temporal, spatial, and intent understanding and reasoning. By unifying accident-centric traffic scenes with broader safety-critical scenarios in air and water, AccidentBench offers a comprehensive, physically grounded testbed for evaluating models under real-world variability. Evaluations of state-of-the-art models (e.g., Gemini-2.5 Pro and GPT-5) show that even the strongest models achieve only about 18% accuracy on the hardest tasks and longest videos, revealing substantial gaps in real-world temporal, spatial, and intent reasoning. AccidentBench is designed to expose these critical gaps and drive the development of multimodal models that are safer, more robust, and better aligned with real-world safety-critical challenges. The code and dataset are available at: https://github.com/SafeRL-Lab/AccidentBench

  • 12 authors
·
Sep 30, 2025

TAD: A Large-Scale Benchmark for Traffic Accidents Detection from Video Surveillance

Automatic traffic accidents detection has appealed to the machine vision community due to its implications on the development of autonomous intelligent transportation systems (ITS) and importance to traffic safety. Most previous studies on efficient analysis and prediction of traffic accidents, however, have used small-scale datasets with limited coverage, which limits their effect and applicability. Existing datasets in traffic accidents are either small-scale, not from surveillance cameras, not open-sourced, or not built for freeway scenes. Since accidents happened in freeways tend to cause serious damage and are too fast to catch the spot. An open-sourced datasets targeting on freeway traffic accidents collected from surveillance cameras is in great need and of practical importance. In order to help the vision community address these shortcomings, we endeavor to collect video data of real traffic accidents that covered abundant scenes. After integration and annotation by various dimensions, a large-scale traffic accidents dataset named TAD is proposed in this work. Various experiments on image classification, object detection, and video classification tasks, using public mainstream vision algorithms or frameworks are conducted in this work to demonstrate performance of different methods. The proposed dataset together with the experimental results are presented as a new benchmark to improve computer vision research, especially in ITS.

  • 4 authors
·
Sep 25, 2022

VegaEdge: Edge AI Confluence Anomaly Detection for Real-Time Highway IoT-Applications

Vehicle anomaly detection plays a vital role in highway safety applications such as accident prevention, rapid response, traffic flow optimization, and work zone safety. With the surge of the Internet of Things (IoT) in recent years, there has arisen a pressing demand for Artificial Intelligence (AI) based anomaly detection methods designed to meet the requirements of IoT devices. Catering to this futuristic vision, we introduce a lightweight approach to vehicle anomaly detection by utilizing the power of trajectory prediction. Our proposed design identifies vehicles deviating from expected paths, indicating highway risks from different camera-viewing angles from real-world highway datasets. On top of that, we present VegaEdge - a sophisticated AI confluence designed for real-time security and surveillance applications in modern highway settings through edge-centric IoT-embedded platforms equipped with our anomaly detection approach. Extensive testing across multiple platforms and traffic scenarios showcases the versatility and effectiveness of VegaEdge. This work also presents the Carolinas Anomaly Dataset (CAD), to bridge the existing gap in datasets tailored for highway anomalies. In real-world scenarios, our anomaly detection approach achieves an AUC-ROC of 0.94, and our proposed VegaEdge design, on an embedded IoT platform, processes 738 trajectories per second in a typical highway setting. The dataset is available at https://github.com/TeCSAR-UNCC/Carolinas_Dataset#chd-anomaly-test-set .

  • 5 authors
·
Nov 13, 2023

Search is All You Need for Few-shot Anomaly Detection

Few-shot anomaly detection (FSAD) has emerged as a crucial yet challenging task in industrial inspection, where normal distribution modeling must be accomplished with only a few normal images. While existing approaches typically employ multi-modal foundation models combining language and vision modalities for prompt-guided anomaly detection, these methods often demand sophisticated prompt engineering and extensive manual tuning. In this paper, we demonstrate that a straightforward nearest-neighbor search framework can surpass state-of-the-art performance in both single-class and multi-class FSAD scenarios. Our proposed method, VisionAD, consists of four simple yet essential components: (1) scalable vision foundation models that extract universal and discriminative features; (2) dual augmentation strategies - support augmentation to enhance feature matching adaptability and query augmentation to address the oversights of single-view prediction; (3) multi-layer feature integration that captures both low-frequency global context and high-frequency local details with minimal computational overhead; and (4) a class-aware visual memory bank enabling efficient one-for-all multi-class detection. Extensive evaluations across MVTec-AD, VisA, and Real-IAD benchmarks demonstrate VisionAD's exceptional performance. Using only 1 normal images as support, our method achieves remarkable image-level AUROC scores of 97.4%, 94.8%, and 70.8% respectively, outperforming current state-of-the-art approaches by significant margins (+1.6%, +3.2%, and +1.4%). The training-free nature and superior few-shot capabilities of VisionAD make it particularly appealing for real-world applications where samples are scarce or expensive to obtain. Code is available at https://github.com/Qiqigeww/VisionAD.

  • 8 authors
·
Apr 16, 2025

Query Carefully: Detecting the Unanswerables in Text-to-SQL Tasks

Text-to-SQL systems allow non-SQL experts to interact with relational databases using natural language. However, their tendency to generate executable SQL for ambiguous, out-of-scope, or unanswerable queries introduces a hidden risk, as outputs may be misinterpreted as correct. This risk is especially serious in biomedical contexts, where precision is critical. We therefore present Query Carefully, a pipeline that integrates LLM-based SQL generation with explicit detection and handling of unanswerable inputs. Building on the OncoMX component of ScienceBenchmark, we construct OncoMX-NAQ (No-Answer Questions), a set of 80 no-answer questions spanning 8 categories (non-SQL, out-of-schema/domain, and multiple ambiguity types). Our approach employs llama3.3:70b with schema-aware prompts, explicit No-Answer Rules (NAR), and few-shot examples drawn from both answerable and unanswerable questions. We evaluate SQL exact match, result accuracy, and unanswerable-detection accuracy. On the OncoMX dev split, few-shot prompting with answerable examples increases result accuracy, and adding unanswerable examples does not degrade performance. On OncoMX-NAQ, balanced prompting achieves the highest unanswerable-detection accuracy (0.8), with near-perfect results for structurally defined categories (non-SQL, missing columns, out-of-domain) but persistent challenges for missing-value queries (0.5) and column ambiguity (0.3). A lightweight user interface surfaces interim SQL, execution results, and abstentions, supporting transparent and reliable text-to-SQL in biomedical applications.

  • 5 authors
·
Dec 19, 2025

Enhancing Traffic Incident Management with Large Language Models: A Hybrid Machine Learning Approach for Severity Classification

This research showcases the innovative integration of Large Language Models into machine learning workflows for traffic incident management, focusing on the classification of incident severity using accident reports. By leveraging features generated by modern language models alongside conventional data extracted from incident reports, our research demonstrates improvements in the accuracy of severity classification across several machine learning algorithms. Our contributions are threefold. First, we present an extensive comparison of various machine learning models paired with multiple large language models for feature extraction, aiming to identify the optimal combinations for accurate incident severity classification. Second, we contrast traditional feature engineering pipelines with those enhanced by language models, showcasing the superiority of language-based feature engineering in processing unstructured text. Third, our study illustrates how merging baseline features from accident reports with language-based features can improve the severity classification accuracy. This comprehensive approach not only advances the field of incident management but also highlights the cross-domain application potential of our methodology, particularly in contexts requiring the prediction of event outcomes from unstructured textual data or features translated into textual representation. Specifically, our novel methodology was applied to three distinct datasets originating from the United States, the United Kingdom, and Queensland, Australia. This cross-continental application underlines the robustness of our approach, suggesting its potential for widespread adoption in improving incident management processes globally.

  • 4 authors
·
Mar 20, 2024

Learning Traffic Crashes as Language: Datasets, Benchmarks, and What-if Causal Analyses

The increasing rate of road accidents worldwide results not only in significant loss of life but also imposes billions financial burdens on societies. Current research in traffic crash frequency modeling and analysis has predominantly approached the problem as classification tasks, focusing mainly on learning-based classification or ensemble learning methods. These approaches often overlook the intricate relationships among the complex infrastructure, environmental, human and contextual factors related to traffic crashes and risky situations. In contrast, we initially propose a large-scale traffic crash language dataset, named CrashEvent, summarizing 19,340 real-world crash reports and incorporating infrastructure data, environmental and traffic textual and visual information in Washington State. Leveraging this rich dataset, we further formulate the crash event feature learning as a novel text reasoning problem and further fine-tune various large language models (LLMs) to predict detailed accident outcomes, such as crash types, severity and number of injuries, based on contextual and environmental factors. The proposed model, CrashLLM, distinguishes itself from existing solutions by leveraging the inherent text reasoning capabilities of LLMs to parse and learn from complex, unstructured data, thereby enabling a more nuanced analysis of contributing factors. Our experiments results shows that our LLM-based approach not only predicts the severity of accidents but also classifies different types of accidents and predicts injury outcomes, all with averaged F1 score boosted from 34.9% to 53.8%. Furthermore, CrashLLM can provide valuable insights for numerous open-world what-if situational-awareness traffic safety analyses with learned reasoning features, which existing models cannot offer. We make our benchmark, datasets, and model public available for further exploration.

  • 8 authors
·
Jun 15, 2024

STARN-GAT: A Multi-Modal Spatio-Temporal Graph Attention Network for Accident Severity Prediction

Accurate prediction of traffic accident severity is critical for improving road safety, optimizing emergency response strategies, and informing the design of safer transportation infrastructure. However, existing approaches often struggle to effectively model the intricate interdependencies among spatial, temporal, and contextual variables that govern accident outcomes. In this study, we introduce STARN-GAT, a Multi-Modal Spatio-Temporal Graph Attention Network, which leverages adaptive graph construction and modality-aware attention mechanisms to capture these complex relationships. Unlike conventional methods, STARN-GAT integrates road network topology, temporal traffic patterns, and environmental context within a unified attention-based framework. The model is evaluated on the Fatality Analysis Reporting System (FARS) dataset, achieving a Macro F1-score of 85 percent, ROC-AUC of 0.91, and recall of 81 percent for severe incidents. To ensure generalizability within the South Asian context, STARN-GAT is further validated on the ARI-BUET traffic accident dataset, where it attains a Macro F1-score of 0.84, recall of 0.78, and ROC-AUC of 0.89. These results demonstrate the model's effectiveness in identifying high-risk cases and its potential for deployment in real-time, safety-critical traffic management systems. Furthermore, the attention-based architecture enhances interpretability, offering insights into contributing factors and supporting trust in AI-assisted decision-making. Overall, STARN-GAT bridges the gap between advanced graph neural network techniques and practical applications in road safety analytics.

  • 2 authors
·
Jul 27, 2025

OOSTraj: Out-of-Sight Trajectory Prediction With Vision-Positioning Denoising

Trajectory prediction is fundamental in computer vision and autonomous driving, particularly for understanding pedestrian behavior and enabling proactive decision-making. Existing approaches in this field often assume precise and complete observational data, neglecting the challenges associated with out-of-view objects and the noise inherent in sensor data due to limited camera range, physical obstructions, and the absence of ground truth for denoised sensor data. Such oversights are critical safety concerns, as they can result in missing essential, non-visible objects. To bridge this gap, we present a novel method for out-of-sight trajectory prediction that leverages a vision-positioning technique. Our approach denoises noisy sensor observations in an unsupervised manner and precisely maps sensor-based trajectories of out-of-sight objects into visual trajectories. This method has demonstrated state-of-the-art performance in out-of-sight noisy sensor trajectory denoising and prediction on the Vi-Fi and JRDB datasets. By enhancing trajectory prediction accuracy and addressing the challenges of out-of-sight objects, our work significantly contributes to improving the safety and reliability of autonomous driving in complex environments. Our work represents the first initiative towards Out-Of-Sight Trajectory prediction (OOSTraj), setting a new benchmark for future research. The code is available at https://github.com/Hai-chao-Zhang/OOSTraj.

  • 5 authors
·
Apr 2, 2024

SafePLUG: Empowering Multimodal LLMs with Pixel-Level Insight and Temporal Grounding for Traffic Accident Understanding

Multimodal large language models (MLLMs) have achieved remarkable progress across a range of vision-language tasks and demonstrate strong potential for traffic accident understanding. However, existing MLLMs in this domain primarily focus on coarse-grained image-level or video-level comprehension and often struggle to handle fine-grained visual details or localized scene components, limiting their applicability in complex accident scenarios. To address these limitations, we propose SafePLUG, a novel framework that empowers MLLMs with both Pixel-Level Understanding and temporal Grounding for comprehensive traffic accident analysis. SafePLUG supports both arbitrary-shaped visual prompts for region-aware question answering and pixel-level segmentation based on language instructions, while also enabling the recognition of temporally anchored events in traffic accident scenarios. To advance the development of MLLMs for traffic accident understanding, we curate a new dataset containing multimodal question-answer pairs centered on diverse accident scenarios, with detailed pixel-level annotations and temporal event boundaries. Experimental results show that SafePLUG achieves strong performance on multiple tasks, including region-based question answering, pixel-level segmentation, temporal event localization, and accident event understanding. These capabilities lay a foundation for fine-grained understanding of complex traffic scenes, with the potential to improve driving safety and enhance situational awareness in smart transportation systems. The code, dataset, and model checkpoints will be made publicly available at: https://zihaosheng.github.io/SafePLUG

  • 7 authors
·
Aug 8, 2025

From Accidents to Insights: Leveraging Multimodal Data for Scenario-Driven ADS Testing

The rapid advancements in Autonomous Driving Systems (ADS) have necessitated robust software testing to ensure safety and reliability. However, automating the generation of scalable and concrete test scenarios remains a significant challenge. Current scenario-based test case generation methods often face limitations, such as unrealistic scenes and inaccurate vehicle trajectories. These challenges largely result from the loss of map information during data extraction and the lack of an effective verification mechanism to mitigate hallucinations in large language models (LLMs). This paper introduces TRACE, a scenario-based ADS Test case Generation framework for Critical Scenarios. By leveraging multimodal data to extract challenging scenarios from real-world car crash reports, TRACE constructs numerous critical test cases with less data, significantly enhancing ADS bug detection efficiency. Using in-context learning, chain-of-thought prompting, and self-validation approaches, we use LLMs to extract environmental and road network information from crash reports. For vehicle trajectory planning, data containing map information and vehicle coordinates serves as a knowledge base to build a ChatGPT-based LLM with path-planning capabilities, which we named TrackMate. Based on 50 existing crash reports, our approach successfully tested three ADS models across two simulation platforms, MetaDrive and BeamNG. Of the 290 constructed test scenarios, 127 are identified as critical, as they resulted in vehicle collisions. Additionally, user feedback reveals that TRACE demonstrates superior scenario reconstruction accuracy, with 77.5% of the scenarios being rated as 'mostly or 'totally' consistent, compared to only 27% for the most related SOTA, LCTGen.

  • 4 authors
·
Feb 4, 2025

An Edge Assisted Robust Smart Traffic Management and Signalling System for Guiding Emergency Vehicles During Peak Hours

Congestion in traffic is an unavoidable circumstance in many cities in India and other countries. It is an issue of major concern. The steep rise in the number of automobiles on the roads followed by old infrastructure, accidents, pedestrian traffic, and traffic rule violations all add to challenging traffic conditions. Given these poor conditions of traffic, there is a critical need for automatically detecting and signaling systems. There are already various technologies that are used for traffic management and signaling systems like video analysis, infrared sensors, and wireless sensors. The main issue with these methods is they are very costly and high maintenance is required. In this paper, we have proposed a three-phase system that can guide emergency vehicles and manage traffic based on the degree of congestion. In the first phase, the system processes the captured images and calculates the Index value which is used to discover the degree of congestion. The Index value of a particular road depends on its width and the length up to which the camera captures images of that road. We have to take input for the parameters (length and width) while setting up the system. In the second phase, the system checks whether there are any emergency vehicles present or not in any lane. In the third phase, the whole processing and decision-making part is performed at the edge server. The proposed model is robust and it takes into consideration adverse weather conditions such as hazy, foggy, and windy. It works very efficiently in low light conditions also. The edge server is a strategically placed server that provides us with low latency and better connectivity. Using Edge technology in this traffic management system reduces the strain on cloud servers and the system becomes more reliable in real-time because the latency and bandwidth get reduced due to processing at the intermediate edge server.

  • 4 authors
·
Apr 26, 2023

Large Language Models (LLMs) as Traffic Control Systems at Urban Intersections: A New Paradigm

This study introduces a novel approach for traffic control systems by using Large Language Models (LLMs) as traffic controllers. The study utilizes their logical reasoning, scene understanding, and decision-making capabilities to optimize throughput and provide feedback based on traffic conditions in real-time. LLMs centralize traditionally disconnected traffic control processes and can integrate traffic data from diverse sources to provide context-aware decisions. LLMs can also deliver tailored outputs using various means such as wireless signals and visuals to drivers, infrastructures, and autonomous vehicles. To evaluate LLMs ability as traffic controllers, this study proposed a four-stage methodology. The methodology includes data creation and environment initialization, prompt engineering, conflict identification, and fine-tuning. We simulated multi-lane four-leg intersection scenarios and generates detailed datasets to enable conflict detection using LLMs and Python simulation as a ground truth. We used chain-of-thought prompts to lead LLMs in understanding the context, detecting conflicts, resolving them using traffic rules, and delivering context-sensitive traffic management solutions. We evaluated the prformance GPT-mini, Gemini, and Llama as traffic controllers. Results showed that the fine-tuned GPT-mini achieved 83% accuracy and an F1-score of 0.84. GPT-mini model exhibited a promising performance in generating actionable traffic management insights, with high ROUGE-L scores across conflict identification of 0.95, decision-making of 0.91, priority assignment of 0.94, and waiting time optimization of 0.92. We demonstrated that LLMs can offer precise recommendations to drivers in real-time including yielding, slowing, or stopping based on vehicle dynamics.

  • 3 authors
·
Nov 15, 2024

CRASH: Crash Recognition and Anticipation System Harnessing with Context-Aware and Temporal Focus Attentions

Accurately and promptly predicting accidents among surrounding traffic agents from camera footage is crucial for the safety of autonomous vehicles (AVs). This task presents substantial challenges stemming from the unpredictable nature of traffic accidents, their long-tail distribution, the intricacies of traffic scene dynamics, and the inherently constrained field of vision of onboard cameras. To address these challenges, this study introduces a novel accident anticipation framework for AVs, termed CRASH. It seamlessly integrates five components: object detector, feature extractor, object-aware module, context-aware module, and multi-layer fusion. Specifically, we develop the object-aware module to prioritize high-risk objects in complex and ambiguous environments by calculating the spatial-temporal relationships between traffic agents. In parallel, the context-aware is also devised to extend global visual information from the temporal to the frequency domain using the Fast Fourier Transform (FFT) and capture fine-grained visual features of potential objects and broader context cues within traffic scenes. To capture a wider range of visual cues, we further propose a multi-layer fusion that dynamically computes the temporal dependencies between different scenes and iteratively updates the correlations between different visual features for accurate and timely accident prediction. Evaluated on real-world datasets--Dashcam Accident Dataset (DAD), Car Crash Dataset (CCD), and AnAn Accident Detection (A3D) datasets--our model surpasses existing top baselines in critical evaluation metrics like Average Precision (AP) and mean Time-To-Accident (mTTA). Importantly, its robustness and adaptability are particularly evident in challenging driving scenarios with missing or limited training data, demonstrating significant potential for application in real-world autonomous driving systems.

  • 9 authors
·
Jul 25, 2024

InterAct-Video: Reasoning-Rich Video QA for Urban Traffic

Traffic monitoring is crucial for urban mobility, road safety, and intelligent transportation systems (ITS). Deep learning has advanced video-based traffic monitoring through video question answering (VideoQA) models, enabling structured insight extraction from traffic videos. However, existing VideoQA models struggle with the complexity of real-world traffic scenes, where multiple concurrent events unfold across spatiotemporal dimensions. To address these challenges, this paper introduces InterAct VideoQA, a curated dataset designed to benchmark and enhance VideoQA models for traffic monitoring tasks. The InterAct VideoQA dataset comprises 8 hours of real-world traffic footage collected from diverse intersections, segmented into 10-second video clips, with over 25,000 question-answer (QA) pairs covering spatiotemporal dynamics, vehicle interactions, incident detection, and other critical traffic attributes. State-of-the-art VideoQA models are evaluated on InterAct VideoQA, exposing challenges in reasoning over fine-grained spatiotemporal dependencies within complex traffic scenarios. Additionally, fine-tuning these models on InterAct VideoQA yields notable performance improvements, demonstrating the necessity of domain-specific datasets for VideoQA. InterAct VideoQA is publicly available as a benchmark dataset to facilitate future research in real-world deployable VideoQA models for intelligent transportation systems. GitHub Repo: https://github.com/joe-rabbit/InterAct_VideoQA

  • 6 authors
·
Jul 19, 2025

Interaction Dataset of Autonomous Vehicles with Traffic Lights and Signs

This paper presents the development of a comprehensive dataset capturing interactions between Autonomous Vehicles (AVs) and traffic control devices, specifically traffic lights and stop signs. Derived from the Waymo Motion dataset, our work addresses a critical gap in the existing literature by providing real-world trajectory data on how AVs navigate these traffic control devices. We propose a methodology for identifying and extracting relevant interaction trajectory data from the Waymo Motion dataset, incorporating over 37,000 instances with traffic lights and 44,000 with stop signs. Our methodology includes defining rules to identify various interaction types, extracting trajectory data, and applying a wavelet-based denoising method to smooth the acceleration and speed profiles and eliminate anomalous values, thereby enhancing the trajectory quality. Quality assessment metrics indicate that trajectories obtained in this study have anomaly proportions in acceleration and jerk profiles reduced to near-zero levels across all interaction categories. By making this dataset publicly available, we aim to address the current gap in datasets containing AV interaction behaviors with traffic lights and signs. Based on the organized and published dataset, we can gain a more in-depth understanding of AVs' behavior when interacting with traffic lights and signs. This will facilitate research on AV integration into existing transportation infrastructures and networks, supporting the development of more accurate behavioral models and simulation tools.

  • 7 authors
·
Jan 21, 2025

Conversational LLMs Simplify Secure Clinical Data Access, Understanding, and Analysis

Large-scale clinical databases offer opportunities for medical research, but their complexity creates barriers to effective use. The Medical Information Mart for Intensive Care (MIMIC-IV), one of the world's largest open-source electronic health record databases, traditionally requires both SQL proficiency and clinical domain expertise. We introduce M3, a system that enables natural language querying of MIMIC-IV data through the Model Context Protocol. With a single command, M3 retrieves MIMIC-IV from PhysioNet, launches a local SQLite instance or connects to hosted BigQuery, and allows researchers to pose clinical questions in plain English. We evaluated M3 using one hundred questions from the EHRSQL 2024 benchmark with two language models: the proprietary Claude Sonnet 4 achieved 94% accuracy, while the open-source gpt-oss-20B (deployable locally on consumer hardware) achieved 93% accuracy. Both models translate natural language into SQL, execute queries against MIMIC-IV, and return structured results alongside the underlying query for verification. Error analysis revealed that most failures stemmed from complex temporal reasoning or ambiguous question phrasing rather than fundamental architectural limitations. The comparable performance of a smaller open-source model demonstrates that privacy-preserving local deployment is viable for sensitive clinical data analysis. M3 lowers technical barriers to critical care data analysis while maintaining security through OAuth2 authentication, query validation, and comprehensive audit logging.

  • 6 authors
·
Jun 27, 2025

Predicting the duration of traffic incidents for Sydney greater metropolitan area using machine learning methods

This research presents a comprehensive approach to predicting the duration of traffic incidents and classifying them as short-term or long-term across the Sydney Metropolitan Area. Leveraging a dataset that encompasses detailed records of traffic incidents, road network characteristics, and socio-economic indicators, we train and evaluate a variety of advanced machine learning models including Gradient Boosted Decision Trees (GBDT), Random Forest, LightGBM, and XGBoost. The models are assessed using Root Mean Square Error (RMSE) for regression tasks and F1 score for classification tasks. Our experimental results demonstrate that XGBoost and LightGBM outperform conventional models with XGBoost achieving the lowest RMSE of 33.7 for predicting incident duration and highest classification F1 score of 0.62 for a 30-minute duration threshold. For classification, the 30-minute threshold balances performance with 70.84% short-term duration classification accuracy and 62.72% long-term duration classification accuracy. Feature importance analysis, employing both tree split counts and SHAP values, identifies the number of affected lanes, traffic volume, and types of primary and secondary vehicles as the most influential features. The proposed methodology not only achieves high predictive accuracy but also provides stakeholders with vital insights into factors contributing to incident durations. These insights enable more informed decision-making for traffic management and response strategies. The code is available by the link: https://github.com/Future-Mobility-Lab/SydneyIncidents

  • 4 authors
·
Jun 26, 2024

Decentralised Traffic Incident Detection via Network Lasso

Traffic incident detection plays a key role in intelligent transportation systems, which has gained great attention in transport engineering. In the past, traditional machine learning (ML) based detection methods achieved good performance under a centralised computing paradigm, where all data are transmitted to a central server for building ML models therein. Nowadays, deep neural networks based federated learning (FL) has become a mainstream detection approach to enable the model training in a decentralised manner while warranting local data governance. Such neural networks-centred techniques, however, have overshadowed the utility of well-established ML-based detection methods. In this work, we aim to explore the potential of potent conventional ML-based detection models in modern traffic scenarios featured by distributed data. We leverage an elegant but less explored distributed optimisation framework named Network Lasso, with guaranteed global convergence for convex problem formulations, integrate the potent convex ML model with it, and compare it with centralised learning, local learning, and federated learning methods atop a well-known traffic incident detection dataset. Experimental results show that the proposed network lasso-based approach provides a promising alternative to the FL-based approach in data-decentralised traffic scenarios, with a strong convergence guarantee while rekindling the significance of conventional ML-based detection methods.

  • 5 authors
·
Feb 28, 2024

CueCAn: Cue Driven Contextual Attention For Identifying Missing Traffic Signs on Unconstrained Roads

Unconstrained Asian roads often involve poor infrastructure, affecting overall road safety. Missing traffic signs are a regular part of such roads. Missing or non-existing object detection has been studied for locating missing curbs and estimating reasonable regions for pedestrians on road scene images. Such methods involve analyzing task-specific single object cues. In this paper, we present the first and most challenging video dataset for missing objects, with multiple types of traffic signs for which the cues are visible without the signs in the scenes. We refer to it as the Missing Traffic Signs Video Dataset (MTSVD). MTSVD is challenging compared to the previous works in two aspects i) The traffic signs are generally not present in the vicinity of their cues, ii) The traffic signs cues are diverse and unique. Also, MTSVD is the first publicly available missing object dataset. To train the models for identifying missing signs, we complement our dataset with 10K traffic sign tracks, with 40 percent of the traffic signs having cues visible in the scenes. For identifying missing signs, we propose the Cue-driven Contextual Attention units (CueCAn), which we incorporate in our model encoder. We first train the encoder to classify the presence of traffic sign cues and then train the entire segmentation model end-to-end to localize missing traffic signs. Quantitative and qualitative analysis shows that CueCAn significantly improves the performance of base models.

  • 4 authors
·
Mar 5, 2023

Dual-Modality Vehicle Anomaly Detection via Bilateral Trajectory Tracing

Traffic anomaly detection has played a crucial role in Intelligent Transportation System (ITS). The main challenges of this task lie in the highly diversified anomaly scenes and variational lighting conditions. Although much work has managed to identify the anomaly in homogenous weather and scene, few resolved to cope with complex ones. In this paper, we proposed a dual-modality modularized methodology for the robust detection of abnormal vehicles. We introduced an integrated anomaly detection framework comprising the following modules: background modeling, vehicle tracking with detection, mask construction, Region of Interest (ROI) backtracking, and dual-modality tracing. Concretely, we employed background modeling to filter the motion information and left the static information for later vehicle detection. For the vehicle detection and tracking module, we adopted YOLOv5 and multi-scale tracking to localize the anomalies. Besides, we utilized the frame difference and tracking results to identify the road and obtain the mask. In addition, we introduced multiple similarity estimation metrics to refine the anomaly period via backtracking. Finally, we proposed a dual-modality bilateral tracing module to refine the time further. The experiments conducted on the Track 4 testset of the NVIDIA 2021 AI City Challenge yielded a result of 0.9302 F1-Score and 3.4039 root mean square error (RMSE), indicating the effectiveness of our framework.

  • 10 authors
·
Jun 9, 2021

MultiOOD: Scaling Out-of-Distribution Detection for Multiple Modalities

Detecting out-of-distribution (OOD) samples is important for deploying machine learning models in safety-critical applications such as autonomous driving and robot-assisted surgery. Existing research has mainly focused on unimodal scenarios on image data. However, real-world applications are inherently multimodal, which makes it essential to leverage information from multiple modalities to enhance the efficacy of OOD detection. To establish a foundation for more realistic Multimodal OOD Detection, we introduce the first-of-its-kind benchmark, MultiOOD, characterized by diverse dataset sizes and varying modality combinations. We first evaluate existing unimodal OOD detection algorithms on MultiOOD, observing that the mere inclusion of additional modalities yields substantial improvements. This underscores the importance of utilizing multiple modalities for OOD detection. Based on the observation of Modality Prediction Discrepancy between in-distribution (ID) and OOD data, and its strong correlation with OOD performance, we propose the Agree-to-Disagree (A2D) algorithm to encourage such discrepancy during training. Moreover, we introduce a novel outlier synthesis method, NP-Mix, which explores broader feature spaces by leveraging the information from nearest neighbor classes and complements A2D to strengthen OOD detection performance. Extensive experiments on MultiOOD demonstrate that training with A2D and NP-Mix improves existing OOD detection algorithms by a large margin. Our source code and MultiOOD benchmark are available at https://github.com/donghao51/MultiOOD.

  • 4 authors
·
May 27, 2024

Beyond the Needle's Illusion: Decoupled Evaluation of Evidence Access and Use under Semantic Interference at 326M-Token Scale

Long-context LLM agents must access the right evidence from large environments and use it faithfully. However, the popular Needle-in-a-Haystack (NIAH) evaluation mostly measures benign span localization. The needle is near-unique, and the haystack is largely irrelevant. We introduce EverMemBench-S (EMB-S), an adversarial NIAH-style benchmark built on a 326M-token MemoryBank. While the full MemoryBank spans 326M tokens for retrieval-based (RAG) evaluation, we evaluate native long-context models only at scales that fit within each model's context window (up to 1M tokens in this work) to ensure a fair comparison. EMB-S pairs queries with collision-tested near-miss hard negatives and gold evidence sets spanning one or more documents, validated via human screening and LLM verification. We also propose a decoupled diagnostic protocol that reports evidence access (document-ID localization) separately from end-to-end QA quality under full-context prompting. This enables consistent diagnosis for both native long-context prompting and retrieval pipelines. Across a reference-corpus ladder from domain-isolated 64K contexts to a globally shared 326M-token environment, we observe a clear reality gap. Systems that saturate benign NIAH degrade sharply in evidence access under semantic interference. These results indicate that semantic discrimination, not context length alone, is the dominant bottleneck for long-context memory at scale.

  • 9 authors
·
Jan 28

Target before Shooting: Accurate Anomaly Detection and Localization under One Millisecond via Cascade Patch Retrieval

In this work, by re-examining the "matching" nature of Anomaly Detection (AD), we propose a new AD framework that simultaneously enjoys new records of AD accuracy and dramatically high running speed. In this framework, the anomaly detection problem is solved via a cascade patch retrieval procedure that retrieves the nearest neighbors for each test image patch in a coarse-to-fine fashion. Given a test sample, the top-K most similar training images are first selected based on a robust histogram matching process. Secondly, the nearest neighbor of each test patch is retrieved over the similar geometrical locations on those "global nearest neighbors", by using a carefully trained local metric. Finally, the anomaly score of each test image patch is calculated based on the distance to its "local nearest neighbor" and the "non-background" probability. The proposed method is termed "Cascade Patch Retrieval" (CPR) in this work. Different from the conventional patch-matching-based AD algorithms, CPR selects proper "targets" (reference images and locations) before "shooting" (patch-matching). On the well-acknowledged MVTec AD, BTAD and MVTec-3D AD datasets, the proposed algorithm consistently outperforms all the comparing SOTA methods by remarkable margins, measured by various AD metrics. Furthermore, CPR is extremely efficient. It runs at the speed of 113 FPS with the standard setting while its simplified version only requires less than 1 ms to process an image at the cost of a trivial accuracy drop. The code of CPR is available at https://github.com/flyinghu123/CPR.

  • 6 authors
·
Aug 13, 2023