diff --git "a/data/chunks/2603.10676_semantic.json" "b/data/chunks/2603.10676_semantic.json"
new file mode 100644--- /dev/null
+++ "b/data/chunks/2603.10676_semantic.json"
@@ -0,0 +1,1742 @@
+[
+  {
+    "chunk_id": "52dee28a-cedb-4325-bf9e-c72039ddf437",
+    "text": "Kosti Koistinen Kirsi Hellsten Joni Herttuainen\nAalto University School of Science Aalto University School of Science Aalto University School of Science\nComputer Science Department Computer Science Department Computer Science Department\nP.O.Box 11000, 00076 P.O.Box 11000, 00076 P.O.Box 11000, 00076\nAALTO, Finland AALTO, Finland AALTO, Finland\nkosti.koistinen@aalto.fi kirsi.hellsten@aalto.fi joni.herttuainen@aalto.fi2026 Kaski\nAalto University School of ScienceMar Computer Science Department\nP.O.Box 11000, 00076\n11 AALTO, Finland\nkimmo.kaski@aalto.fi March 12, 2026[cs.LG] ABSTRACT Industrial Control Systems (ICS) underpin critical infrastructure and face growing cyber–physical\nthreats due to the convergence of operational technology and networked environments. While\nmachine learning–based anomaly detection approaches in ICS shows strong theoretical performance,\ndeployment is often limited by poor explainability, high false-positive rates, and sensitivity to\nevolving system behavior, i.e., baseline drifting. We propose a Spatio-Temporal Attention Graph\nNeural Network (STA-GNN) for unsupervised and explainable anomaly detection in ICS that models\nboth temporal dynamics and relational structure of the system. Sensors, controllers, and network\nentities are represented as nodes in a dynamically learned graph, enabling the model to capture\ninter-dependencies across physical processes and communication patterns. Attention mechanisms\nprovide influential relationships, supporting inspection of correlations and potential causal pathways\nbehind detected events. The approach supports multiple data modalities, including SCADA point\nmeasurements, network flow features, and payload features, and thus enables unified cyber–physical\nanalysis. To address operational requirements, we incorporate a conformal prediction strategy to\ncontrol false alarm rates and monitor performance degradation under drifting of the environment.arXiv:2603.10676v1 Our findings highlight the possibilities and limitations of model evaluation and common pitfalls\nin anomaly detection in ICS. Our findings emphasise the importance of explainable, drift-aware\nevaluation for reliable deployment of learning-based security monitoring systems. Modern societies rely on uninterrupted functioning of interconnected critical infrastructure, such as electric power grids,\nwater treatment plants, and manufacturing systems [1]. A disruption in these Operational Technology (OT) systems can\ncascade into severe economic, social, and physical consequences, from prolonged power outages to contaminated water\nsupplies. Over the past decade, cyberattacks such as Stuxnet [2], Industroyer [3] and the Colonial Pipeline incident [4]\nhave demonstrated that threats once limited to Information Technology (IT) networks can now directly impact the\nphysical world, such as equipment damage or even threats to human life [5]. During the past decade, cyberattacks on\nOT networks have been reported to have increased five fold from 300 annually to 1600 [6].",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 1,
+    "total_chunks": 87,
+    "char_count": 3026,
+    "word_count": 387,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "bfd5edd1-cebb-4110-8f50-25218099970d",
+    "text": "The actual scale is likely to\nbe significantly higher, as many OT intrusions remain unreported or undiscovered due to limited monitoring capabilities. A PREPRINT - MARCH 12, 2026",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 2,
+    "total_chunks": 87,
+    "char_count": 178,
+    "word_count": 28,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "eaa08045-5a39-4e43-82c0-83a4bff65d74",
+    "text": "By 2024, operational disruption had become routine: 50–75% of ransomware incidents caused partial shutdowns, and\napproximately 25% resulted in complete production stoppages, causing significant financial damage [6]. Industrial Control Systems (ICS) form the technological backbone of critical infrastructure and are often the target of\ncyberattacks against OT systems. They regulate physical processes through sensors, actuators, and Programmable Logic\nControllers (PLCs) and are maintained through Supervisory Control and Data Acquisition (SCADA). Traditionally\nisolated from external networks, the OT systems once enjoyed a degree of \"security by separation.\" However, the\nshift toward networked automation, remote management, and the Industrial Internet of Things (IIoT) has converged\nIT and OT networks, allowing adversaries to move laterally from corporate systems to industrial environments. This\ndevelopment has exposed ICS environments to a wide spectrum of cyber threats [7]. In addition, OT environments often\nrely on legacy hardware, strict safety protocols, and systems that cannot be easily updated or patched, which further\nincreases their vulnerability to cyberattacks. ICS threats range from malware infections and ransomware to unauthorised remote access, data manipulation, and\nprocess disruption. In many cases, attackers exploit vulnerabilities in outdated software, weak authentication, or\ninsecure network configurations that were never designed with cybersecurity in mind. Typical weaknesses include\noutdated protocols that allow unintended access or manipulation of control traffic. The types of attacks are commonly\ndivided into network-based and physical-based attacks [5]. The former includes Denial of Service (DoS), injection, and\nMan-in-the-Middle attacks, while the latter include stealth attacks, data tampering, and damage attacks. To detect and mitigate these complex attacks, Intrusion Detection Systems (IDS) are widely used in modern industrial\ncybersecurity. An IDS typically consists of the monitoring, pre-processing, and detection phases [8]. Among various\ndetection approaches, such as signature-based, rule-based, and hybrid-based, anomaly-detection-based IDS have gained\nsignificant attention for their ability to learn normal operational behavior and discover deviations that may signal attacks,\nintrusions, or malfunctions [9]. In OT networks, this capability is crucial, as anomalies are often subtle irregularities\nrather than clear malicious signatures.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 3,
+    "total_chunks": 87,
+    "char_count": 2502,
+    "word_count": 332,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "ef9e6aed-888c-48c1-b073-c0c9846dd2d9",
+    "text": "They may appear as small fluctuations in sensor data, unexpected timing patterns,\nunusual command sequences, or deviations in process variables that remain within protocol limits but still indicate\nunsafe or suspicious behavior. There are various approaches that have been applied to detect and prevent cyber intrusions, including statistical modeling,\nBayesian inference, and rule-based systems. These methods often rely on predefined assumptions about normal and\nabnormal behavior [10]. However, as industrial systems become more complex and dynamic, such fixed models struggle\nto capture the nonlinear and time-varying nature of real-world operations [11]. In contrast, machine learning–based\napproaches have attracted widespread interest for their ability to automatically learn patterns from data and adapt\nto evolving system behavior [9]. These methods can uncover correlations between multiple variables, making them\nparticularly suitable for anomaly detection in ICS. Traditional machine learning approaches include, for example, k-nearest neighbors, Random Forests, and Support\nVector Machines [12]. However, despite their efficiency in classification, these methods are insufficient to model\ntemporal dependencies that are inherent in OT traffic. They are also sensitive to imbalanced datasets, such that a new\nunseen anomaly often remains undetected. Furthermore, in most OT environments, the majority of traffic is benign,\nwhile only a small fraction represents attacks.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 4,
+    "total_chunks": 87,
+    "char_count": 1482,
+    "word_count": 203,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "427bb19a-2137-4d7f-a836-4f1a93450d81",
+    "text": "This imbalance can lead to biased classifiers that fail to detect rare but\ncritical anomalies. To overcome these limitations, Deep learning approach has emerged as a promising solution. Autoregressive architectures such as Long Short-Term Memory (LSTM) networks [13] and other Recurrent Neural Network approaches\n(RNNs) [14] can capture complex temporal patterns among variables, which allows the model to understand how\nchanges in one part of the system influence the rest. However, the performance of these models also suffers from\nunbalanced data. Some other methods include autoencoders, Generative Adversarial Networks, and a mixture of these\nmodels, but all have similar limitations. More recently, transformer-based architectures have become popular because\nof their ability to model long-range dependencies using self-attention mechanisms. Their success in natural language\nprocessing has motivated research into their application for anomaly detection in time-series and network data, where\nsequential dependencies and contextual relationships are crucial. Transformer models, and particularly adaptations of\nlanguage models, show the potential to capture complex semantic patterns in network traffic representations [15]. Beyond and within sequential approaches, graph-based deep learning provides a fundamentally different way to\nrepresent and analyse OT systems. By modeling the system as a graph, where nodes represent entities (such as\ndevices or sensors) and edges represent their relationships or communications, a more realistic and structured view\nof the environment can be obtained. Graph-based models are able to uncover non-linear correlations and long-range\ndependencies that traditional time-series or tabular approaches often miss. Graph Neural Networks (GNNs), such as\nGraph Convolutional Networks (GCNs) [16] and Graph Attention Networks (GATs) [17], exploit this representation\nby learning how information propagates through the network structure. GCNs aggregate neighborhood information A PREPRINT - MARCH 12, 2026",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 5,
+    "total_chunks": 87,
+    "char_count": 2042,
+    "word_count": 279,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "780098a8-dc43-42ac-aee9-c738c442df8c",
+    "text": "to capture local dependencies, while GATs extend this approach by applying attention mechanisms to weigh the\nimportance of different connections dynamically. Through these formulations, GNNs can effectively model both the\ntopology and interactions within the system, and thus enable a more accurate detection of anomalous behavior that may\nemerge across multiple interconnected entities. For a more detailed review of current methods, see [18]. Although harnessed with state-of-the-art machine learning, there are still some issues to consider before their usage for\nanomaly detection in ICS, such as the lack of open high-quality datasets for research and high false alarm rates (for\nother challenges, see [19]). In addition, deep learning introduces challenges of its own, related to its explainability and\ninterpretability. As models become more complex and rely on multilayer architectures, their internal decision-making\nprocesses become opaque. In ICS environments, operators must understand why an alert was triggered, and this lack\nof transparency creates a significant barrier to adoption. Explainability methods aim to improve the trustworthiness,\ninterpretability, and accountability of machine learning models by providing human-understandable insights into how\nthey reach their conclusions [20]. However, the application of explainability techniques into ICS remains a challenge. First, ICS traffic is often highdimensional and highly contextual, making it difficult to map model outputs to meaningful operational features. Second,\nmany explainability tools are computationally expensive or unstable when applied to time-series or graph-based deep\nlearning models. Third, explanations must be not only technically accurate but also domain-relevant, i.e., operators need\nactionable insights, not abstract attributions. As a result, despite significant progress, current explainability solutions\noften do not meet the stringent requirements of industrial environments. More research is needed to develop lightweight,\nreliable, and domain-aware explainability mechanisms that can support real-time decision-making and foster operator\ntrust in AI-driven anomaly detection. To address the aforementioned challenges, we propose an unsupervised GNN-based framework that uses graph-oriented\nmachine learning for explainable anomaly detection in ICS.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 6,
+    "total_chunks": 87,
+    "char_count": 2354,
+    "word_count": 316,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "e61f4f3f-4ad0-410d-9691-577619012376",
+    "text": "The model constructs a graph representation of the system\nthat enables learning of the relationships among sensors, actuators, and process variables. Within this architecture,\nattention mechanisms are employed to extract the most influential dependencies in the graph, allowing the model to\nfocus on critical interactions during anomaly scoring. By examining the resulting correlation structures, we analyse\nhow the model's learned relationships align with known causal dependencies in the industrial process. This facilitates\ntransparent system-level anomaly detection, which traditional models might overlook. Furthermore, the framework is\ntunable as it can operate on SCADA-point data for point-level anomaly detection, on netflow data for passive network\nmonitoring, or on both modalities simultaneously through a multimodal configuration. This study is organised such that in Section 2 we discuss related works in the field of Explainable AI. Next, in Section 3\nwe introduce our model architechture and evaluation strategy. In Section 4 we present the data we use for benchmarking\nthe model, followed by presenting the results and analysis of the acquired graph representations in Section 5. Then in\nSection 6 we discuss the methodological and practical issues encountered during the analysis and reflect more broadly\nthe common issues in Machine Learning anomaly detection. In Section 7 we draw conclusions and on what could be\nthe focus of future work. In this section, we provide a short review of the most relevant work on explainable artificial intelligence (XAI). Although\nthe literature on XAI is extensive, (see e.g., [21,22]), only recently have cybersecurity and IDS applications begun\nto receive dedicated attention. Here, our aim is to highlight the works that explore explainability, specifically, for\nnon-experts and experts in IDS and of OT environments. Explainable AI as a field emerged formally in 2004 [23], but its development accelerated significantly in the last decade\nalongside the rise of deep learning. The \"black box\" nature of the deep learning models grew an interest for trustworthy\nand explainable AI in various fields, e.g., in medical sciences, finance and autonomous systems [24].",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 7,
+    "total_chunks": 87,
+    "char_count": 2219,
+    "word_count": 331,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "dfac526a-4d78-400c-a676-ec16b38e1d34",
+    "text": "A widely accepted\ntaxonomy categorises XAI methods into intrinsic (ante-hoc) and post-hoc models [25]. Intrinsic explanations arise\ndirectly from the model architecture through weights, rule structures, or built-in interpretability constraints. The model\nitself is designed to be transparent. For example, these models include classifiers and regressors. In contrast, post-hoc\napproaches aim to explain the model's outputs via various tactics. Early contributions include game-theory approaches,\nin which SHAP explanations are the most popular for explaining the importance of features. Another popular type\nof post-hoc -approaches includes gradient- and decomposition-based techniques, where backpropagated gradients are\nmodified or analysed to attribute importance [26]. Other examples include perturbation-based explanations [27,28]. The\nlatter raises an important point that most of the XAI-methods are for supervised setting, while in most of the real-world\nICS systems labeled data are unrealistic assumptions. The authors provided an unsupervised fine-tuning module that\ncould be used in problematic features, allowing for model adjustment without exhaustive re-training.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 8,
+    "total_chunks": 87,
+    "char_count": 1178,
+    "word_count": 155,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "08af1cef-42a8-440e-9f75-436ae874599a",
+    "text": "A PREPRINT - MARCH 12, 2026 The XAI literature for graph deep learning includes several post-hoc explanation techniques designed to interpret the\npredictions of GNNs. Many of these approaches rely on graph masking, where the goal is to learn masks on edges, nodes,\nor features to identify the substructures most influential in a model's decision. One of the most widely cited methods is\nGNNExplainer [29], a model-agnostic explainer that is applicable to any GNN architecture. By optimizing soft masks\non edges and features, GNNExplainer extracts subgraph-level explanations that find key structural components and\nnode attributes driving the output of the model. This method has been adopted in cybersecurity contexts, including\nIDS, as demonstrated in [30]. A related method is PG-Explainer [31], which differs from GNNExplainer by training a\nparametric explanation network that generalises across instances rather than optimizing a mask separately for each\nprediction. The optimization strategy improves scalability and stability while retaining the ability to identify influential\nedges. PG-Explainer has been utilised in IDS research, for example, in [32]. In OT environments, the application of graph explainers is much more limited. A notable exception is KEC (Khop Explanation with Convolutional Core) [33], which was applied to anomaly detection in the SWaT industrial\ncontrol benchmark dataset [34]. Unlike the masking-based paradigm, KEC constructs a surrogate linear model that\napproximates the local behavior of the GNN and derives explanations through gradient-based attribution. The authors\nintroduce a formal notion of faithfulness, a measure of how well an explainer preserves model behavior and show that\nKEC achieves higher faithfulness than existing explainers. A common challenge among GNN explanation methods is that many of them provide partial explanations, focusing\nonly on one dimension—edges, nodes, or features—without offering a unified view. The ILLUMINATI framework [35]\naddresses this limitation by producing comprehensive explanations that consider the contribution of node importance,\nedge importance, and node-attribute together.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 9,
+    "total_chunks": 87,
+    "char_count": 2164,
+    "word_count": 309,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "ad8cc4da-8afc-423b-b820-0bf107d6fc9a",
+    "text": "Designed specifically for cybersecurity use cases, it extends traditional\nmasking approaches with a richer explanatory structure. Comparative evaluations of GNN explainers for IDS have also\nemerged. For example, recent work in [36] finds that GraphMask [37] performs particularly well for DoS and DDoS\nattack detection, outperforming other explainers in robustness and interpretability. However, despite this promising\nresult, we did not find substantial evidence of GraphMask being applied more broadly in IDS or OT-focused literature. Finally, a branch of graph deep learning approaches uses attention mechanisms [38] as a tool for generating explanations. Attention mechanism allows for a model to assign different importance weights to different nodes or edges, highlighting\nwhich relationships it considers most relevant during prediction. Graph Attention Networks (GATs) are build on this\nidea by using attention to reveal correlations between learned embeddings [17]. Some models, including the Graph\nDeviation Network (GDN) [39], which also inspires the present study, apply attention mechanisms to time series\nfor identifying variable-level dependencies and highlighting anomalous patterns. This approach captures localised\ndeviations in sensor behavior using both structural relationships and temporal dynamics within OT systems. A very\nrecent approach, PCGAT [40], extends attention-based reasoning by modeling ICS through multi-level physical-process\nand controller graphs, to enable both anomaly detection and anomaly localization via attention patterns. The authors\nhighlight several limitations of typical attention-based methods. They argue that attention weights learned purely\nfrom data do not necessarily correspond to the true causal or physical relationships in ICS, and therefore may produce\nexplanations that are misleading from an operational perspective. This can create difficulties in identifying the actual\nsources of anomalies and understanding how they propagate through the system. Furthermore, they claim that many\nexisting GAT-based anomaly detectors rely on unrealistic fully connected sensor graphs, resulting in high computational\ncost, redundancy, and limited interpretability. These models also fail to incorporate the hierarchical and process-driven\nstructure of ICS, reducing their reliability and diminishing the usefulness of attention weights as explanations.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 10,
+    "total_chunks": 87,
+    "char_count": 2402,
+    "word_count": 324,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "74c9544b-0b52-4680-a40e-5260cd03a57d",
+    "text": "In short,\nthe complexity of graph-based deep learning introduces several challenges, which our study seeks to address. Here we propose the Spatio-Temporal Attention Graph Neural Network (STA-GNN), designed to capture both\ntemporal dependencies and dynamic spatial correlations among sensors or devices (henceforth entities) in multivariate\ntime series data. The STA-GNN is inspired by the Graph Deviation Network [39] and Graph Attention Network [17],\nwith several modifications combining temporal attention mechanisms with an adaptive graph construction strategy that\nlearns context-dependent relationships between entities. In this section, we will explain the model architecture and\nanomaly detection methodology, and the model framework is illustrated in Fig. 1. Each node in the graph corresponds to an entity and is associated with a multivariate feature vector. Specifically, at each\ntimestep t, an entity i is represented by a feature vector xt,i ∈RF , where F denotes the number of observed variables\nfor that i (e.g. continuous measurements and Boolean indicators). Over a sliding window of length W, the input tensor\ntherefore takes the form X ∈RB×W ×N×F , where N is the number of i and B is the batch size. A PREPRINT - MARCH 12, 2026 Figure 1: A schematic overview of the STA-GNN model architecture. The workflow illustrates the processing stages\nfrom input windows to the decoder producing predictions. The intermediate blocks employ a two-phase attention\nmechanism that generates two complementary graphs, enabling inspection of the model's decision making. model, the nodes are treated as feature-bearing entities whose representations are progressively transformed into latent\nembeddings that jointly encode temporal dynamics and inter-dependencies. The model first applies a linear projection\nH at each timestep t:\nHt = Linear(Xt) + Pt, (1)\nwhere Pt represents a learnable positional embedding for the timestep t ∈{1, . . . , W} that encodes the temporal order\nwithin the observation window. Next, we go through in detail the stages of the anomaly detection process from the\ninput window to the temporal, spatial, and decoder blocks of the STA-GNN model architecture. To model temporal dependencies, each nodes' time series within the observation window is processed by a multi-head\nself-attention mechanism (MHA), inspired by the Transformer architecture and originally developed for natural language\nprocessing [38].",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 11,
+    "total_chunks": 87,
+    "char_count": 2437,
+    "word_count": 365,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "51df982a-0f3a-41e5-a104-ed53f7821299",
+    "text": "MHA enables each timestep in a node's sequence to attend to every other (past) timestep within the\nwindow, allowing the model to capture both short-term fluctuations and long-range temporal dependencies without\nrelying on recurrence. In practice, we apply causal masking in the temporal attention module so that a timestep cannot\nattend to future observations, preventing information leakage or data snooping. Formally, given the linear projection Ht for a single entity, the attention module constructs representations for query Q,\nkey K and value V through learned linear projections:\nQt = WQHt, Kt = WKHt, Vt = WV Ht, (2) A PREPRINT - MARCH 12, 2026 where WQ, WK, WV ∈Rd×d are learnable parameter matrices, and d denotes the embedding dimension of the latent\nrepresentation. The linear projection operates across the feature dimension F. The attention weights are computed as\nscaled dot-products between queries and keys:\nQtK⊤t\nαt = softmax √ , (3)\nwhich measure the degree of relevance between every timestep pairs. These weights are then used to form a weighted\nsum of the value vectors:\nH′t = αtVt, (4)\nproducing an updated temporal representation where each timestep encodes information aggregated from all others. To capture seasonal, weekly, and daily fluctuations, multiple attention heads are used in parallel, each operating on a\ndifferent subspace of the embedding dimension. The outputs of these heads are linearly combined:\nH′ = MHA(Qt, Kt, Vt) = Concat(head1, . . . , headh) WO, (5)\nwhere WO ∈R(h·dh)×d projects the concatenated result back to the model dimension. The resulting representation\nis then aggregated across timesteps (e.g., via mean pooling) and normalised through a Layer Normalization (LN)\noperation, yielding the final temporally encoded features W ! 1\nH = LN X H′[t] , (6)\nt=1\nwhere H ∈RB×N×d represents the temporally contextualised embedding for each entity. This tensor H is the output\nof the temporal feature extractor and serves as the input to the subsequent spatial attention stage, which models the\ninter-entity dependencies. Unlike conventional GNNs that rely on static graphs, the STA-GNN constructs dynamic spatial graphs based on both the\ncontextual similarity Sctx and static similarity Sst. For each sample b, the dynamic contextual similarity is computed\nfrom the temporally encoded features as\nS(b)ctx = HbH⊤b , (7)\nwhere Hb ∈RN×d denotes the slice of H corresponding to the batch element b.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 12,
+    "total_chunks": 87,
+    "char_count": 2440,
+    "word_count": 386,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "ddd9190b-94cf-4dd8-84e6-0ad8ae76d1d2",
+    "text": "In addition to the dynamic similarity,\nthe model also supports an optional external static prior graph Astatic ∈RN×N, which can encode domain knowledge\nabout the entity connectivity, physical topology, or known relationships. When provided, the entries of Astatic are\nnormalised and incorporated directly as the static similarity term. If no external graph is supplied, the model instead\nlearns a static entity embedding matrix E ∈RN×d, from which a static similarity is constructed as\nSst = EE⊤, (8)\nwhich corresponds to a (scaled) cosine similarity after ℓ2-normalization of the rows of the embedding matrix E. If prior\nis introduced, Sst is passed with normalised values of Astatic. The combined similarity matrix is then given by\nS(b) = S(b)ctx + λSst, (9)\nwhere λ is a learnable scaling parameter. The model thus learns to adaptively balance between dynamic contextual\ndependencies and static structural patterns. To propagate information among entities, the model applies another attention mechanism over the temporally encoded\nrepresentations H to capture spatial dependencies. In this phase, the queries, keys, and values are newly projected from\nH using distinct learnable matrices W Q(sp) , WK(sp) , WV(sp) , which allow each entity to attend to all others based on\ntheir recent temporal behavior. We employ multi-head scaled dot-product attention over entities (rather than a GAT-style\nadditive attention with a LeakyReLU nonlinearity). Concretely, for each head, queries, keys, and values are obtained as\nQsp = W Q(sp) H, Ksp = WK(sp) H, V = W V(sp) H, (10)\nand the attention logits are computed via scaled dot-products between entities. The resulting attention scores are\nmodulated by the similarity prior S(b), yielding the dynamic attention matrix\nQ(b)sp K(b)⊤sp !\nA(b) = softmax √ + S(b) , (11)\nd T A PREPRINT - MARCH 12, 2026 where T is a learnable temperature parameter controlling the sharpness of attention. To enhance sparsity and interpretability, only the top-k most relevant neighbors (i.e., with the highest attention weights) are kept for each node,\nensuring efficient message passing and reducing noise from weak connections. For multi-head attention, this procedure\nis applied independently per head; the resulting attention weights can be averaged across heads for interpretability.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 13,
+    "total_chunks": 87,
+    "char_count": 2311,
+    "word_count": 357,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "af3adcc3-4bc0-4512-b9c6-12ffb1b2aebf",
+    "text": "The spatially constructed features for each sample b and entity i are then given by H(sp)b,i,: = X A(b)i,j Vb,j,: + βHb,i,:, (12)\nj=1 or, in matrix form,\nH(sp) = AV + βH, (13)\nwhere β is a residual weighting factor. Thus, each H(sp)b,i,: is a learned spatio-temporal feature vector for entity i, obtained\nas an attention-weighted aggregation of its neighbors' value embeddings plus a residual contribution from its own\ntemporal representation. The resulting tensor H(sp) ∈RB×N×d encodes both temporal and spatial dependencies for\neach entity. Finally, the normalised representations are passed through a fully connected multilayer perceptron (MLP) decoder\napplied independently to each entity. For each sample b and entity i, we compute\nˆyb,i = fθ H(sp)b,i,: , (14) where fθ denotes a two-layer feed-forward network with nonlinearity (ReLU) between layers. In matrix form, this can\nbe written as\nˆY = MLP(H(sp)) ∈RB×N×F , (15) yielding one output per node feature and sample based on the final spatio-temporal feature representation. 3.2 Training Objective",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 14,
+    "total_chunks": 87,
+    "char_count": 1056,
+    "word_count": 164,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "302ab678-e7b1-4842-a748-18cd23c456a7",
+    "text": "Each entity i may contain both continuous and Boolean features, and the loss aggregates reconstruction errors across\nthese feature dimensions. This design allows heterogeneous variables to contribute appropriately to the training signal\nwhile preserving a unified node-level representation in the graph. For example, exogeneous temporal features may be\nappended to node features.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 15,
+    "total_chunks": 87,
+    "char_count": 379,
+    "word_count": 52,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "fed7839e-6386-4030-8240-4427612848a8",
+    "text": "The model is trained in a semi-supervised setting, using only data assumed to represent normal system behaviour. The\nlearning objective is to minimise the difference between the model's reconstructed feature values ˆYb and the observed\nvalues Yb for each batch element b ∈{1, . . . , B}. Because the dataset may include both continuous-valued features and\nBoolean/indicator features, we employ a composite loss function, MixedLoss, which combines a mean–squared error\n(MSE) term for continuous features and a binary cross–entropy (BCE) term for Boolean features. Let C denote the\nindices of continuous features and B the indices of Boolean features. The training loss for a single window is\nLmixed = γcont · X (ˆYb,i,f −Yb,i,f)2\n|C|\n(i,f)∈C\n(16)\n+ γbool · X BCE( ˆYb,i,f, Yb,i,f)\n|B|\n(i,f)∈B where γcont and γbool weight the relative influence of continuous and Boolean feature types. MixedLoss ensures that\neach feature type contributes appropriately to the learning signal. At inference time, we use the same MixedLoss\nformulation both for the scalar anomaly score and for per-entity explanations, ensuring that the detection objective is\naligned with the training objective. For each sliding window w, we compute feature-wise reconstruction errors and aggregate them into a per-entity\nMixedLoss contribution. For a continuous feature f ∈C of entity i, the reconstruction error is defined as\new,i,f = (ˆYw,i,f −Yw,i,f)2, A PREPRINT - MARCH 12, 2026 and for a Boolean feature f ∈B of entity i, we define\new,i,f = BCE(ˆYw,i,f, Yw,i,f). Each ew,i,f ≥0 therefore represents the MixedLoss error contribution of feature f of entity i for window w. The per-entity reconstruction error is obtained by aggregating feature-wise errors using the same weighting scheme as\nin training:\n1 1\new,i = γcont · X ew,i,f + γbool · X ew,i,f,\n|Ci| |Bi|\nf∈Ci f∈Bi\nwhere Ci and Bi denote the sets of continuous and Boolean features associated with entity i, respectively. The model\ncan therefore be used either by aggregating the errors per node or by detecting anomalies at the node–feature level.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 16,
+    "total_chunks": 87,
+    "char_count": 2076,
+    "word_count": 333,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "4b2606de-43eb-4f45-b369-acf1ff8b589f",
+    "text": "An\noverall anomaly score for the window is finally obtained by averaging the per-entity losses: sw = X ew,i .\ni=1\nHigher values of sw reflect greater deviation from behaviour learned during training.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 17,
+    "total_chunks": 87,
+    "char_count": 199,
+    "word_count": 33,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "f912002b-b531-45c6-8024-985dcf01c08d",
+    "text": "3.4 Graph Explanations During inference, the STA-GNN produces two complementary graph structures, the contextual similarity graph Gcs\nand the attention graph Ga. In both representations, the nodes correspond to entities, whereas the dynamically evolving\nedges encode relationships between them. The Gcs captures relations between the learned temporal embeddings,\nreflecting how similar the recent temporal dynamics of different entities are within a given observation window. In\ncontrast, the Ga represents directed inter-entity dependencies, where edge weights encode the magnitude and direction\nof the learned correlations, that is, how information is propagated between entities in the latent space. Fig. 2 illustrates an example of the model's outputs during anomaly detection. When an anomaly is detected, both\ngraphs are visualised to highlight the underlying relational patterns. The nodes that are considered anomalous, are\nplotted with distinct colours, while the rest are kept at the background as grey. For interpretability, only the top five\nedges with highest similarity per node are retained in Gcs, ensuring sparse and readable structure. For Ga, the edges\nare filtered to include only those that originate or end at anomalous nodes. The amount of edges is restricted by\ntopk-attention weights.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 18,
+    "total_chunks": 87,
+    "char_count": 1309,
+    "word_count": 191,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "7886d8e6-83cc-422a-86c3-1cfbe40b9fae",
+    "text": "One of the main metrics to evaluate the performance of our model in detecting anomalies is the false positive rate (FPR)\ndefined as follows\nFPR = (17) FP + TN,\nwhere false positive (FP) is the number of incorrect predictions or false alarms, and true negative (TN) is the number of\ncorrect predictions of no alarms. In the model evaluation, the emphasis is on minimising the FPR i.e., avoiding false\nalarms, while still maintaining adequate anomaly detection performance. Furthermore, we summarise the detection\nquality by using the F1 score and evaluate two thresholding strategies: (i) a threshold that maximises the F1 score\non validation data and (ii) a conformal-thresholding scheme based on nonconformity scores. The F1 score combines\nprecision and recall into a single harmonic-mean metric. Given the number of true positives (TP), false positives (FP)\nand false negatives (FN), the F1 score is defined as\n2 precision · recall 2 TP\nF1 = = (18) precision + recall 2 TP + FP + FN,\nwhere precision = TP/(TP + FP) and recall = TP/(TP + FN). We first compute anomaly scores sw for each\nwindow w and choose a threshold that maximises the F1 score on the labeled evaluation set. This provides an\nunsupervised operating point that balances missed anomalies and false alarms. To explicitly control false alarms in a more distribution-free and sequential setting, we also use an inductive nonconformity scoring scheme [41]. Let s1, . . . , sT denote the anomaly scores on a set of calibration windows assumed to be\nnormal. We define difference nonconformity scores c with\nc1 = 0, (19)\nct = max 0, st −st−1 , t = 2, . . . , T, (20) A PREPRINT - MARCH 12, 2026 Attack detected and contribution highest from red (highest) to yellow (lowest).",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 19,
+    "total_chunks": 87,
+    "char_count": 1735,
+    "word_count": 300,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "d8adec50-77a1-4723-a8f5-1ab909445fd6",
+    "text": "The grey edges\nrepresent the learned embeddings + prior graph structure. The red edges come from the spatial attention. Only the\nstrongest attention weights from/to anomalous nodes are plotted for interpretability. Red edge thickness reflects to\nstrength of the attention. The graph nodes are organised and fixed by process stages in SWaT testbed dataset used in\nthis study.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 20,
+    "total_chunks": 87,
+    "char_count": 374,
+    "word_count": 58,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "a865b877-0709-4d15-b3a3-c9f7ccef07e0",
+    "text": "which emphasise sudden increases in the anomaly score and are less sensitive to slow shift of values. Given a significance\nlevel α, we then choose a threshold qα as an upper quantile of the calibration scores, i.e.\nqα = Quantile1−α(c1, . . . , cT ), (21)\nand at evaluation time declare window t anomalous if ct ≥qα. The benefit of the conformal approach is twofold: it\nautomatically adapts to the empirical score distribution and, under standard exchangeability assumptions, provides\nfinite-sample guaranties that the probability of a false alarm does not exceed approximately α. In our experiments, we\nchoose a heuristic value α = 10−3, which yields a low false positive rate while still allowing the model to react to\npronounced score increases. For example, with data sampled in 10-second intervals, this threshold corresponds to an\nexpected false alarm roughly once every three hours under nominal conditions. Another advantage of the approach is",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 21,
+    "total_chunks": 87,
+    "char_count": 950,
+    "word_count": 153,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "99315cdd-409d-4f26-be1b-610474070763",
+    "text": "A PREPRINT - MARCH 12, 2026 Table 1: Overview of the SWaT datasets used in this study across physical and network modalities. Measurements from\nphysical sensors and network traffic were aggregated and resampled to 10-second intervals. SWaT Modality Nodes #Features #Instances Duration #Attacks\nDataset Physical 51 1 ∼95 000 7 d normal + 4 d attack 41\n2015 NetFlow 9 11 ∼95 000 7 d normal + 4 d attack 41\nNetFlow+Payload 9 14 ∼95 000 7 d normal + 4 d attack 41 Physical 51 1 ∼49 000 6 d normal 0\n2017 NetFlow 9 11 ∼17 000 2 d normal 0\nNetFlow+Payload 9 14 ∼17 000 2 d normal 0 2019 Jul Physical 51 1 ∼1 500 4 h attack 6 Physical 51 1 ∼1 300 4 h attack 5\n2019 Dec NetFlow 9 11 ∼1 300 4 h attack 5\nNetFlow+Payload 9 14 ∼1 300 4 h attack 10",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 22,
+    "total_chunks": 87,
+    "char_count": 736,
+    "word_count": 152,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "c5c3bbfb-55fa-4aca-a73e-b1f1db012674",
+    "text": "that the threshold qα is fixed by the distribution of the calibration scores. As a result, if the typical scoring behavior\nof the system starts to change and the evaluation scores consistently exceed their calibration levels, the number of\nthreshold exceedances will gradually increase. This behaviour is a clear indication of covariate drift, signaling that the\nmodel may no longer be well suited to the altered environment. Conventional performance metrics, such as F1-score or\naccuracy cannot reveal such changes in the underlying data distribution. For a detailed description of the conformal\nprediction framework, see [42].",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 23,
+    "total_chunks": 87,
+    "char_count": 628,
+    "word_count": 95,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "c6e23474-cf07-45bf-9bdc-d01350f84253",
+    "text": "The Secure Water Treatment (SWaT) testbed is one of the most widely used benchmarking datasets available for\nresearch on ICS security. It represents a scaled-down, fully operational water purification plant designed to reproduce\nthe behavior, equipment interactions, and cyber-physical processes found in real facilities. The system produces\napproximately five gallons of treated water per minute and operates in six sequential process stages, each equipped with\na range of sensors, such as level transmitters, pressure gauges, and water-quality probes, as well as actuators including\npumps and motorised valves. The sensors and actuator names, and further detailed description of the environment, are\nprovided in [34]. In illustrative Figures 2, 5 and 6, we have arranged the process stages horizontally, from left, process\nstage 1, to stage 6, right.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 24,
+    "total_chunks": 87,
+    "char_count": 852,
+    "word_count": 126,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "d7491bab-157d-4817-846a-454da96953b6",
+    "text": "The SWaT datasets provide both process-layer (physical-level) measurements obtained from the SCADA/PLC level\nand detailed OT network traffic, including partial CIP protocol payloads. Communication between PLCs, sensors,\nactuators, and the supervisory SCADA layer is extensively logged, enabling the simultaneous analysis of physical\nprocess behavior and network activity. This multimodal perspective is crucial as previous work has shown that effective\nanomaly detection requires both physical measurements and communication patterns, since attacks may affect only a\nsingle modality or manifest across both [43]. The 2015 SWaT dataset includes a long period of normal operation followed by a series of 41 controlled cyberattacks,\ntargeting communication links and manipulating one or multiple process stages. These attacks range from stealthy\nmodifications to aggressive actuator manipulation, making the 2015 SWaT dataset a challenging and realistic benchmark. The rest of the selected SWaT datasets used in our study are provided in Table 1. 4.1 Data Pre-Processing & Model Training",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 25,
+    "total_chunks": 87,
+    "char_count": 1084,
+    "word_count": 152,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "f9a6912e-adc7-48da-9878-565d337cab2c",
+    "text": "For the physical-level data, all continuous sensor values were treated as floating-point variables,\nwhile discrete control states (e.g., on, off, auto) were one-hot encoded. Continuous features were scaled using min–max\nnormalization, defined as\nx −xmin\nx′ = , (22)\nxmax −xmin\nwhere xmin and xmax are minimum and maximum values from the training data, and x is a value to normalise. The\nevaluation dataset was fitted with these normalization parameters. In physical-level datasets, each node corresponds to a\nsingle sensor or actuator signal, and no additional node-level features were introduced. A PREPRINT - MARCH 12, 2026",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 26,
+    "total_chunks": 87,
+    "char_count": 625,
+    "word_count": 96,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "43374aa9-1dbc-488f-9165-18afa2fd93f9",
+    "text": "For the Netflow data, an explicit design choice was required to define the node entities. We chose\nthe set of IP addresses observed in the traffic as entities. More precisely, we selected PLC-, SCADA point-, and\nworkstation IP addresses as individual nodes, based on prior system knowledge. All remaining traffic was aggregated\ninto a single auxiliary node labeled Other IP. We extracted the features of the standard NetFlow protocol, including the source port, source IP, destination IP, transport\nprotocol, and frame length. We restricted the feature set to flow-level metadata, as packet payloads are often encrypted\nand therefore unavailable. Moreover, flow-based representations significantly reduce computational costs compared to\ndeep packet inspection [44]. From these base features, we derive the features per node. These include, for example,\nShannon entropy, defined as Hsrc = − X pi logb(pi), (23)\ni=1\nwhere k denotes the number of distinct source ports observed within an aggregation window, and pi = niN is the\nempirical probability of the source port i, with ni occurrences out of N total flows. The rest of the derived features are\npresented in Table 2. We note that this is just an example, and other approaches for deriving features exist. Table 2: Aggregated node-level features for the NetFlow and NetFlow+Payload data models. All features are sampled\n10 seconds interval.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 27,
+    "total_chunks": 87,
+    "char_count": 1392,
+    "word_count": 218,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "555e6ad6-8ce7-4e95-9b75-70a4b47a1df1",
+    "text": "NetFlow features\nRows sent / received Number of flow records sent and received\nBytes sent / received Total number of bytes sent and received from frame length\nSource port entropy Entropy of observed source ports\nProtocol entropy Entropy of observed protocols\n#Sources / #Destinations Number of distinct source and destination peers NetFlow + Payload features\nCIP byte entropy Shannon entropy of the CIP payload bytes. For example, typical\nmessage could be 10x4 bytes. CIP value mean Mean of extracted CIP numeric values per message. CIP word entropy Shannon Entropy of parsed CIP fields. For example, a message\nwith 10x4 bytes would have 10 \"words\".",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 28,
+    "total_chunks": 87,
+    "char_count": 649,
+    "word_count": 104,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "29caf05c-7c13-443e-b8d6-b93db5607bdd",
+    "text": "Exogenous features\nDay of week Weekday indicator\nHour of day Hour indicator\nHour of week Hour indicator Because the NetFlow representation transitions from a single scalar feature to a multi-channel feature vector, we\nadditionally included exogenous temporal features. These include hour of day, hour of week, and day of week, which\nare commonly used in time-series modeling to capture diurnal and weekly periodicities. Such features can improve\nmodel confidence and stability, see, e.g, [45].",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 29,
+    "total_chunks": 87,
+    "char_count": 493,
+    "word_count": 74,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "0e06b1ac-429a-45b9-b9dd-96d7843294fe",
+    "text": "NetFlow + Payload Dataset. The 2015 dataset does not provide raw PCAP files for extensive payload extraction;\ninstead, it includes NetFlow records augmented with CIP protocol attributes, more precisely, the encapsulated CIP\nmessages [46]. For the 2017 and 2019 Dec datasets, we deliberately retained the same base feature set, even though\nricher payload feature engineering would have been possible. This choice ensures comparability of model performance\nacross all datasets. In the NetFlow+Payload setting, we used the same flow-level feature channels as in the NetFlow-only\ncase and augmented them with payload-derived statistics. These include payload entropies from message and word-level,\nand payload mean of CIP extracted data.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 30,
+    "total_chunks": 87,
+    "char_count": 733,
+    "word_count": 106,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "7f76a82b-b4b7-43cf-a87f-4895799ca426",
+    "text": "Training, Calibration, and Sampling. As the proposed evaluation method relies on conformal prediction, the dataset\nwas split into training, calibration, and test sets using a temporal split of 80/10/10. Feature normalization parameters\nwere computed exclusively in the training set and subsequently applied to the calibration and test sets. Data shuffling\nwas not used because it could allow information leakage from future observations.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 31,
+    "total_chunks": 87,
+    "char_count": 437,
+    "word_count": 62,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "839f329d-9ebb-400f-92cc-617dddb0dd73",
+    "text": "Similarly, subsampling and folding A PREPRINT - MARCH 12, 2026 Table 3: Hyperparameter search space and functional roles for the proposed graph-temporal neural network model. Some common hyperparameters, e.g., learning rate, are omitted. Common Hyperparameters Value(s) Description\nEmbedding dimension 128 Dimensionality of latent node and temporal representations, controlling overall model capacity and\nattention head size. Graph attention heads 4 Number of parallel subspaces used in the multihead graph attention mechanism. Top-k neighbors 6 Maximum number of neighboring nodes attended\nto per node, controlling graph sparsity and computational cost. Weight decay 10−4 L2 regularization strength applied to model parameters during optimization. Learnable Hyperparameters Static prior scale 10 Weight of the static graph similarity prior relative\nto the dynamic context-based similarity. With this\nparameter, the importance of prior graph can be\ncontrolled by initializing it. Attention temperature 0.9 Scaling factor controlling the sharpness of the\ngraph attention distribution. techniques were avoided.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 32,
+    "total_chunks": 87,
+    "char_count": 1108,
+    "word_count": 149,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "efe20b17-811d-4162-aec4-24d96d2932db",
+    "text": "We did, however, sample the data for 10 second aggregates, as we have observed many\nrelated works have done the same. During model training, we experimented with different learning rates, embedding\ndimensions, and time window sizes. We observed no improvement in training or evaluation loss when using embedding\ndimensions greater than 128 or window sizes greater than 6.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 33,
+    "total_chunks": 87,
+    "char_count": 371,
+    "word_count": 58,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "cdcfa562-4312-4b2a-8ea6-4436cc95307b",
+    "text": "Therefore, we opted to keep the complexity of the model\nto a minimum. The rest of the tunable hyperparameters are shown in Table 3. In this section, we evaluate the performance of our model in comparison with alternative machine learning approaches. We also analyse strategies for selecting the optimal detection threshold and, through illustrative examples, demonstrate\nhow detected anomalies and graph representations reveal the underlying causal relationships. The complete table of\nresults and analysis is provided in the Appendix.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 34,
+    "total_chunks": 87,
+    "char_count": 535,
+    "word_count": 79,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "137627d9-22e3-4524-9e80-f2de7076e6fd",
+    "text": "5.1 Best-Performing Model We begin by analyzing the performance of the model in all three data modalities. The proposed STA-GNN is compared\nagainst several simpler models in terms of F1-score, FPR, and the number of detected attacks, thereby justifying the\nmodel complexity and architectural choices. The results are summarised in Table 4. As an initial model selection\nstrategy, we applied the maximization of the F1-score to determine decision thresholds for the trained models, which\nis a common practice in ADS machine learning. The models for comparison include two classical machine-learning\nmethods (K-means and Support Vector Machine (SVM)) and a more advanced, an auto-regressive, LSTM-based\nVariational Autoencoder (LSTM-VAE). The classical methods were not evaluated for the NetFlow modalities due to\ntheir poor performance already in the scalar physical-level model.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 35,
+    "total_chunks": 87,
+    "char_count": 878,
+    "word_count": 128,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "0a2285a6-3fb6-4fa2-b690-1f2cecb0ea5d",
+    "text": "For the proposed STA-GNN approach, we evaluated\ntwo configurations: a simplified variant using only a gated recurrent unit (GRU) without embeddings and temporal\nattention (STA-GNN*), and the full model incorporating both temporal and spatial attention mechanisms (STA-GNN). Physical-level models with only one scalar feature per node provided highest F1-score for our models. The two classical\nmachine learning approaches, K-means and SVM, did not produce meaningful results, in accordance with the results\nin [39]. The LSTM-VAE, despite its relatively simple autoregressive structure, achieved an F1-score close to that of the\nbest-performing models. However, a closer inspection of detected attacks shows that its performance is misleading: the\nmodel successfully detects only two attacks. The inflated F1-score is explained by the fact that there is an attack that\naccounts for more than 40% of the attack data points. Any model capable of detecting this attack significantly improves\nthe model F1 score. This observation highlights that strict reliance on F1-score maximization is inadequate to evaluate\nanomaly detection models in this context.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 36,
+    "total_chunks": 87,
+    "char_count": 1149,
+    "word_count": 167,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "004be0d9-65f4-49b1-a079-04810a02cfec",
+    "text": "A PREPRINT - MARCH 12, 2026 Table 4: Model comparison across physical and network modalities. F1-score, false positive rate (FPR), and attacks\ndetected (AD) are reported for each modality. The classical models with high AD suffer from high FPR, which makes\nthem impractical for realistic deployment scenarios. The STA-GNN* refers to a simplified variant of STA-GNN without\ntemporal encoding or the temporal attention component.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 37,
+    "total_chunks": 87,
+    "char_count": 427,
+    "word_count": 64,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "f20b7d16-e4f3-4a62-9158-cbc85563b97d",
+    "text": "The best-performing models according to F1-score and AD\nare highlighted in bold. K-means SVM LSTM-VAE STA-GNN* STA-GNN\nDataset Modality\nF1 FPR AD F1 FPR AD F1 FPR AD F1 FPR AD F1 FPR AD Physical-level 0.29 0.829 26 0.24 0.860 33 0.72 0.001 2 0.74 0.002 11 0.77 0.004 15\nSWaT 2015 NetFlow – – – – – – 0.23 0.83 35 0.19 0.88 35 0.19 0.89 36\nNetFlow+Payload – – – – – – 0.72 0.003 2 0.74 0.003 11 0.74 0.006 16 NetFlow-models without CIP-payload data were not able to reliably detect attacks in any of the studied cases, as they\nproduced excessive false positives, rendering them impractical for deployment. This behavior is likely due to the\nnoisy and low-semantic structure of flow-level data due to NetFlow summarizing traffic using only coarse statistical\naggregates. In contrast, incorporating payload information substantially improved performance, as evidenced by\nthe NetFlow+Payload model achieving detection capabilities comparable to the physical-level model. Although the\nphysical-level model produced the lowest false positive rates overall, the NetFlow+Payload configuration detected the\nlargest number of attacks. 5.2 Nonconformity Scoring and Thresholding Table 4 demonstrates that threshold selection strategy plays a critical role in practical model performance. Although\nmaximizing the F1-score reduces the FPR, further improvements are possible. By applying difference-based nonconformity scoring, we significantly reduce false positives while at the same time, quite surprisingly, increase the number of\ndetected attacks.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 38,
+    "total_chunks": 87,
+    "char_count": 1538,
+    "word_count": 232,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "b69ea100-cd1b-40b2-9015-f6488be5f4ce",
+    "text": "The FPR can be treated as a user-defined parameter and set to a desired level through the calibration\nscores. For the six-day baseline, it was not feasible to enforce guaranties below an FPR of 0.001, as the calibration set\nis too small and the assumption of exchangeability degrades at more extreme thresholds. Breaking the exchangeability,\nin turn, leads to poor attack detection performance.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 39,
+    "total_chunks": 87,
+    "char_count": 394,
+    "word_count": 63,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "d8aee5de-1473-4022-9306-c61fd9c35610",
+    "text": "Longer and more stable baseline periods would enable stronger\nguaranties and better align with operational requirements. For example, [47] note that even a single false positive every\nsix months can be considered excessive in industrial deployments. This rate corresponds to an FPR on the order of\n10−6—approximately three orders of magnitude lower than the achievable thresholds in our setting.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 40,
+    "total_chunks": 87,
+    "char_count": 395,
+    "word_count": 59,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "a28c8cf0-0260-4365-b381-ca2df8167512",
+    "text": "Table 5: Evaluation results of the STA-GNN model under two thresholding strategies: F1-maximization and difference\nnonconformity scoring. Choosing the latter gives highest AD, but with a very low F1-score. We emphasise that F1 does\nnot always reflect the desired performance of the model. F1-max threshold Conformal calibration threshold Dataset Modality F1max FPRmax AD F1conf FPRconf AD Physical-level 0.77 0.004 15 0.03 0.001 20\nSWaT 2015 Netflow 0.22 0.881 36 0.01 0.001 9\nNetflow+Payload 0.74 0.006 16 0.02 0.001 22 Difference-based conformal thresholding also allows the model to adapt to different phases of an attack. Once an alarm\nis raised, subsequent observations within the same attack episode do not trigger repeated alerts. Although absolute\nnonconformity scores may remain above the threshold, their relative changes do not, effectively suppressing redundant\nalarms. This behavior provides additional qualitative insight that short, transient attacks tend to trigger a single alert,\nnot affecting much to the system, whereas prolonged or system-wide cascade failures continue to generate alarms,\nreflecting their severity and the urgency of response. This is demonstrated in Fig. 3a, where no cascade-failure occurs. The attack has no effect on the system and remains a point source. On the other hand, in Fig. 3b, an attack on a sensor\ntriggers alarms throughout the system during the attack, suggesting cascade failure. We also note that the true source of\nthe attack is not often detected in cascade failures: In Fig. 3b, the attack against DPIT301 is detected seven minutes\nafter the attack started because other device reconstruction errors dominate and trigger the alarm elsewhere. Finally, while difference-based nonconformity scoring reduces false alarms through strict FPR guaranties, it also leads\nto low F1-scores when evaluated under conformal thresholds.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 41,
+    "total_chunks": 87,
+    "char_count": 1882,
+    "word_count": 281,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "78cbdd2f-73be-473d-aef2-2d2f489be41f",
+    "text": "Indeed, the resulting F1-scores fall below 0.04 in all A PREPRINT - MARCH 12, 2026 cases, as shown in Table 5. Nevertheless, the model remains highly effective at detecting attacks when the decision\nthreshold is set with a conformal evaluation strategy.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 42,
+    "total_chunks": 87,
+    "char_count": 253,
+    "word_count": 41,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "bbe8376e-42fc-4939-99c0-b9202421a5e8",
+    "text": "(a) Attack to AIT202. (b) Attack to DPIT301. Figure 3: Comparison of normalised sensor response windows (shaded red) during the attack window (shaded blue and\nseparated with blue dashed line). The attack on left was detected only once in the beginning of the attack. The attack on\nright was detected multiple times during attack, from various sensors and actuators (a cascade failure). For clarity, we\nonly show top 3 anomalous sensors per detected anomaly.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 43,
+    "total_chunks": 87,
+    "char_count": 457,
+    "word_count": 74,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "6f69f0da-0443-4357-b8ed-00589cdb8757",
+    "text": "5.3 Model Performance Across Datasets The conformal framework enables explicit control over the FPR, providing monitoring of the model\nperformance over time. Gradual increases in the FPR can serve as indicators of degraded model performance or baseline\ndrift, and this phenomenon is clearly observed in our experiments. As shown in Fig. 4, the model trained on the 2015\ndataset exhibits a sharp performance decline when applied to data from later years. Already in the 2017 dataset, the FPR\nincreases on the order of 10−2, corresponding to approximately 3–4 alarms per hour, which would be impractical for\nreal-world deployment. This behavior suggests a baseline drift, which aligns findings, for example, in [48]. The results\nindicate that the model is highly sensitive to even minor shifts in individual sensor signals. Our model repeatedly alerts\nfrom sensor AIT201 and few other sensors. Although we did not investigate the signals in detail, we can confidently say\nthat there is a drift as the same sources repeatedly alert.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 44,
+    "total_chunks": 87,
+    "char_count": 1029,
+    "word_count": 163,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "702d5a72-612b-4bac-bde8-e9fe49cf6ce6",
+    "text": "To further investigate this temporal degradation, we retrained a separate model using the 2017 dataset as a baseline and\ncompared it to the 2019 July and December datasets. In this setting, the physical-point model again fails. However, it\nnow holds the FPR guarantee but is not able to detect attacks effectively. We do not detect similar shift of the sensors\nthat we detected earlier with 2015 model. Yet another advantage of non-conformal scoring scheme, a topic not covered so far, is the possibility to deal with the\nbaseline drift via recalibrating the scores. The drift occurs because of various reasons, e.g., wearing of the equipment,\nvariations in environmental conditions, sensor aging or recalibration of the equipment. The drift has been observed in\nSWaT datasets and reported, for example, in [49].",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 45,
+    "total_chunks": 87,
+    "char_count": 812,
+    "word_count": 130,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "b6bcfdab-a251-446b-94a2-97d66fb99a3c",
+    "text": "Recalibration of conformal scores can adjust the decision threshold\nand prolong the performance of the model, without requiring extensive retraining of the model. Thus, we recalibrated\nthe 2015 model with 2017 data. This time, the model retains it's FPR for 2019 datasets, but unfortunately, could not\nretain it's anomaly detection capability in this case either.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 46,
+    "total_chunks": 87,
+    "char_count": 363,
+    "word_count": 55,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "d8eff64b-fbde-4e78-81e5-068d56fcf0cb",
+    "text": "The inefficiency of recalibration is indirect evidence of\nanother type of drifting, i.e., concept drift. Unlike covariate drift, concept drift is a result of change in the testbed\nconfiguration. In formal terms, the problem space changes. In covariate drift, the input space changes, which can be\ndealt with recalibration of the model. For example, changes of data processing pipelines, alterations in operating or\nusage patterns cause concept drift. In [49], the authors further speculate that this could be the case between the 2015\nand 2019 SWaT datasets. We followed the same evaluation and adaptation strategy for Netflow+Payload modality.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 47,
+    "total_chunks": 87,
+    "char_count": 644,
+    "word_count": 98,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "93289b03-9749-4060-9c82-217698db21b9",
+    "text": "Although\nrecalibration and retraining hold the low FPR guarantees, the models detect only 2 out of 6 attacks. This outcome is\nexpected, as even in the original 2015 dataset only approximately half of the attacks were detected. The Dec 2019 dataset\nis thus a poor indicator of model performance because it contains only a few attacks.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 48,
+    "total_chunks": 87,
+    "char_count": 333,
+    "word_count": 56,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "75582f32-6fdf-4bef-834c-eb11e8da1e53",
+    "text": "Furthermore, while recalibration A PREPRINT - MARCH 12, 2026 Figure 4: FPR across datasets. Top: Model performance with (red) and without (blue) retraining. Bottom: Performance\nwith recalibration of the 2015 model using the 2017 baseline. The FPR can be controlled with recalibration, which is\noften more feasible than retraining the model. proves sufficient to maintain the FPR guarantees, the squared prediction errors per node and per time step increase in\n2019 Dec dataset. This ultimately leads to rendering the model impractical for long-term deployment. Consequently, as\nthe observed growth of prediction errors is rather evidence of concept than covariate drift, the retraining of the model\nremains the most reliable option for model deployment.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 49,
+    "total_chunks": 87,
+    "char_count": 753,
+    "word_count": 113,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "406b936c-0353-4395-a87f-2e30f2bb1cc4",
+    "text": "Another likely explanation of poor detection rate of attacks is the incompleteness of the original NetFlow data, which\nwe intentionally replicated during the preprocessing of the 2017 and 2019 PCAP files. The primary limitation in\nthis setting arises from the preprocessing and feature representation of the network traffic. More expressive feature\nengineering, such as incorporating write tags or richer descriptors of payload-level behavior, is likely to improve\ndetection performance, as demonstrated in [46]. However, a detailed investigation of optimal modeling and feature\ndesign in this context lies outside the scope of this work, which focuses on model endurance rather than benchmarking,\nand is therefore left for future research.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 50,
+    "total_chunks": 87,
+    "char_count": 740,
+    "word_count": 108,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "b8f93ab1-b52b-43c7-b3bd-fd046ec40d9e",
+    "text": "5.4 The Attention Graphs and Explainability The final attention-weight graph Ga, together with the highest anomaly scores, enables the inspection of both the\nanomaly points and their correlations within the system. We examine the detected attacks and their associated attention\nweights and study how these correlations respond to causal relationships. We use the documented system architecture,\nknown causality maps provided in [50], and examples from [33] for qualitative analysis. In Table 6, we summarise the A PREPRINT - MARCH 12, 2026 Table 6: Summary of correct detection and causal inference performance for the SWaT 2015 dataset across Physicallevel and NetFlow+Payload modalities. The pure NetFlow modality is excluded because it did not yield meaningful\nresults; detailed analysis is provided in the Appendix. Physical-level Netflow+Payload\nAlarms Raised Correct Alarms Raised Correct\nDetection Causality Detection Causality\n20 15 12 22 15 14 findings, with the analysis and rationale provided in the Appendix.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 51,
+    "total_chunks": 87,
+    "char_count": 1020,
+    "word_count": 149,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "ab73c4b7-7bc3-4e0f-b2b1-d14b120dd3e9",
+    "text": "Among the alarms raised in the Physical-level and\nNetFlow+Payload modalities, approximately 68 to 75% of attacks were correctly detected and traced, while correct or\npartially correct causal relationships were identified in approximately 60 to 63% of the alerts raised. Figure 5: Attack on DPIT301 detected via anomalies in FIT601, with attention edges highlighting system-level\ndependencies between distant process stages. When an alarm is raised, the outcome can be interpreted in two ways: whether the detection localises the true source(s)\nor the immediately affected devices, and whether the edges of attention reflect the correct underlying causality. These\ninterpretations allow us to distinguish between correct detections and meaningful causal explanations. Causality might\nbe captured despite mislocalization, and vice versa. Furthermore, cases in which either the detection or the inferred\ncausal structure or both are incorrect. This analysis raises an important aspect in model evaluation: Attention graphs\nallow us also to assess whether the model is functioning meaningfully. In highly imbalanced evaluation datasets,\ncontaining many attacks within a short time period, the model can simply raise alarms and occasionally \"guess\" correct A PREPRINT - MARCH 12, 2026 results without having converged to a well-functioning representation. This could lead to a false sense of security that\nthe model is functioning correctly. For example, in the NetFlow modality, although nine attacks were detected, we\nobserved that the model recognised them by chance. Alarms were consistently raised on incorrect PLC devices and did\nnot produce meaningful attention edges. As an example of successful model performance, we use a known result that\nan attack on the backwash (DPIT301) causes malfunctioning of the pumps P601 and P602 [33]. This attack is detected\nby our model as an anomaly in the flow meter (FIT601) and is illustrated in Fig. 5 (the same attack as the one illustrated\nin the sensor level in Fig. 3b).",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 52,
+    "total_chunks": 87,
+    "char_count": 2015,
+    "word_count": 305,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "6032d049-9cc4-4eee-aa95-e430985a5b60",
+    "text": "The attention edges with highest weights indeed capture the relationship between these\nstages, even though they are far apart in the system. For Netflow+Payload data, i.e., using feature channels and IP addresses as nodes yields the best results when combined\nwith CIP payload data.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 53,
+    "total_chunks": 87,
+    "char_count": 282,
+    "word_count": 44,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "7b50abc8-9138-46a2-8480-883162d0a05b",
+    "text": "However, this configuration reduces interpretability and explainability. In fact, we can only trace\nalarms and attention edges to the IP-level, which is less informative than physical-level representations. We cannot\ndirectly identify which physical devices are attacked and we can only trace events back to the PLC-level.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 54,
+    "total_chunks": 87,
+    "char_count": 322,
+    "word_count": 46,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "09bcf5ce-4f64-4fc1-8faa-76c87f8e723a",
+    "text": "Furthermore,\nthe attention edges are not often informative, as many of these relationships are already well-known a priori. However,\nmany real ICS environments are highly complex and may include hundreds of PLC devices and workstations. As the\nsystem size increases, this method becomes increasingly feasible and valuable. Finally, we show by comparison how prior knowledge of the system shapes the attention edges. The prior structure is\nderived from the adjacency graph presentation of the system, such that components within the same stage are considered\nfully connected. The connecting components are then linked to other processing stages, as deduced from the system\ndescription in [51]. This is only one example, and alternative prior graph constructions such as causal directed graphs\nhave been investigated, for example, in [50]. In Fig. 6a, without structural constraints, the inferred causal relationships in\nthe model can be dominated by noisy correlations. For example, pumps or valves that exhibit similar behavior are often\nconnected by attention edges, even if they are physically far apart in the system and no true causal connection exists. When an alarm is raised, edges connected to correlated but non-causal devices may reduce the practical usefulness of\nthe model. The resulting graph with a stronger prior is in Fig. 6b. The meaningless correlations are no longer present. We retained a simple prior graph for two reasons: (a) we are non-experts in the system domain and lack detailed\noperational expertise, and (b) we wanted to allow the model to learn the structure autonomously, rather than letting the\nprior dominate.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 55,
+    "total_chunks": 87,
+    "char_count": 1643,
+    "word_count": 255,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "ffc97bbf-4a38-4595-9dc1-9f4f2156e816",
+    "text": "This approach enables the detection of long-range dependencies in an ad hoc manner, as illustrated\nin Fig. 5. Finally, we note that using strong prior knowledge of the system does not necessarily improve detection\naccuracy, as it may reduce long-range dependencies; however, it can enhance the explainability. This trade-off will be\nexplored in future work. In this section, we examine the methodological and practical issues encountered during our analysis and reflect on how\nour findings agree or deviate from previous work in the literature. Here, we focus on the limitations of commonly used\nevaluation schemes, the operational relevance of our results, and the broader challenges of applying machine learning\nin industrial cybersecurity. We will also critically assess our modeling choices, including the role of explainability,\narchitectural constraints, and multimodal inputs. These reflections will shed light on the limitations of our approach and\ndiscuss the directions in which future work should focus on to achieve reliable and deployable anomaly detection in\nreal-world systems. A central challenge in evaluating anomaly detection models for cyber-physical systems is that commonly reported\nmetrics, most notably the F1 score, do not always reflect the true operational value of the model. One reason is that the\nduration of an attack heavily influences the F1 score, but many anomaly detection models detect an attack only after\nit has begun to significantly affect the system. However, the early stages of an attack often cause negligible physical\ndeviation, which makes them difficult to detect. Penalising the model for not recognizing these weak initial signals\nresults in a lower F1 score even when the model performs exactly as required in practice, i.e. alerting when the system\ndeviates from normal behavior. This discrepancy leads to misleading comparisons in the literature, where the number\nof detected attacks is rarely reported. Our results in Table 5 underscore the problem that the F1 score might be very\nlow but performs better than using the F1 maximizing strategy. The other aforementioned benefit from nonconformity\nscoring support using it as thresholding method and as a framework. Event-based F1 evaluation has been proposed, where each detected attack is flagged as a single positive instance.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 56,
+    "total_chunks": 87,
+    "char_count": 2331,
+    "word_count": 356,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "8d88d7b9-7a78-4d61-9007-e533376d63eb",
+    "text": "This would make the model comparison more uniform. However, this does not necessarily make the F1 score more\nrepresentative of the performance of the model, as the imbalance due to the attack durations still biases the metric. A persistent issue is that long system-wide attacks can dominate the score. In the SWaT 2015 dataset, for example,\nan attack ID 28 (see Appendix A) is relatively easy to detect because it targets the pump P302 and triggers a cascade\nfailure across the system. Correct detection of this single attack accounts for approximately 60% of all anomalous time A PREPRINT - MARCH 12, 2026 (a) No Prior Graph (b) Prior Graph Figure 6: Attack to AIT504 with and without soft prior graph. The soft prior helps filtering the edges not related to\ncausality. The grey edges in the background are the contextual learned edges + static graph from temporal attention. Only spatial attention edges from anomalous nodes are retained for clarity. Note that the detected anomaly points are\nreducted also, because the prior restricts the dynamical similarity.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 57,
+    "total_chunks": 87,
+    "char_count": 1064,
+    "word_count": 175,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "649b3398-85cd-46b7-ae57-173291bb9385",
+    "text": "As a result, any model that identifies this event achieves a substantially inflated F1 score. We observed that\nLSTM-VAE detected only two attacks, yet its F1 score was almost comparable to that of our best-performing model. Moreover, in most model-design studies we reviewed, the number of detected attacks is often not explicitly reported. This limits the interpretability of benchmark comparisons, as we argue that, along with FPR, the ability to detect a\ndiverse set of distinct attacks is a critical factor in assessing practical model performance. High FPR is another key issue in anomaly detection. Frequent false alarms tend to impose a high operational burden,\nleading to fatigue of alerts and reducing operator trust in the system. A custom is that a useful model is trained with\nthe lowest possible FPR, even at the expense of a lower detection rate of true anomalies. This is also a limitation of\nour model such that we rather keep the FPR low and allow some attacks to remain undetected.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 58,
+    "total_chunks": 87,
+    "char_count": 999,
+    "word_count": 166,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "cbb05824-cc5c-4b3f-b327-09c0ba081bd6",
+    "text": "Furthermore, manual\ninspection shows that a substantial portion of false positives are directly followed by attacks and related to them. Removing these attack-adjacent alerts from false positives reduced our FPR count by 40% in the physical-level model,\nleaving only a small number of genuinely spurious alarms. This is yet another indication that operational relevance is\nnot always captured by standard metrics. The issues discussed so far reflect a broader challenge in machine-learning-based cybersecurity research, in which many\npublished models are evaluated primarily under benchmark-oriented settings. The emphasis on marginal improvements A PREPRINT - MARCH 12, 2026",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 59,
+    "total_chunks": 87,
+    "char_count": 675,
+    "word_count": 96,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "9db2bfd6-8124-4da2-ac18-1dc988ae8a9a",
+    "text": "in recall and accuracy is often a consequence of ambiguous or inconsistent evaluation methodologies. As shown in [52],\ndata leakage, inappropriate sampling, model selection bias from cross-validation, and temporal snooping are widespread\npitfalls, particularly in time-series scenarios. Neglecting these issues can lead to overly optimistic performance\nestimates. Several methods we reviewed report near-perfect F1 scores of 1.0, and some machine-learning approaches\nclaim extremely high detection rates, e.g., those in [53,54]. We explicitly designed our pipeline to mitigate the risks,\nfor example, by ensuring that no temporal information from the test period is used during training or preprocessing. Although this conservative approach reduces performance on current benchmark datasets, it could yield more reliable\nestimates for unseen data, which is a critical requirement for deployment in real systems. Therefore, our focus has been\non qualitative and causal evaluation of the detected attacks, rather than reporting recall or accuracy.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 60,
+    "total_chunks": 87,
+    "char_count": 1045,
+    "word_count": 146,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "fce6fb54-10cb-43d6-ab5e-ed3b859673cf",
+    "text": "In addition, we discuss a fundamental issue that is often neglected in many model approaches, namely the covariate\nand concept drifts. The gradual change in the statistical properties of data and changed configurations from time to\ntime cause the anomaly detection models to lose accuracy as the system behavior evolves [55]. We could tackle the\ncovariate shifts with a recalibration approach, but the concept drift always requires re-training of the model. This is an\nissue for most static machine learning models, where the problem space is unknown. We acknowledge this and admit\nthat the nonconformity scoring does not solve all the problems in dynamic environments but can extend the lifespan of\nthe model. We note also that for model performance observations over time, the monitoring of FPR is an excellent tool,\nallowed by nonconformity scoring.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 61,
+    "total_chunks": 87,
+    "char_count": 852,
+    "word_count": 135,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "c702a054-6d39-4d9b-9677-dc8c10e22b2a",
+    "text": "Next, we address the known limitation that the attention-based methods are inherently unreliable due to noisy correlations\nunrelated to causality (see, for example, [56]). In our physical-level modality, the introduction of structural priors\nsignificantly reduced spurious attention. The attention mechanism continued to function between physically meaningful\ncomponents, while irrelevant edges were largely suppressed. This shows that attention mechanisms can be effective\nwhen guided by sensible inductive biases. In turn, the method might filter out meaningful, explainable edges as well,\nwhich can be considered as a limitation. The prior use is thus a trade-off between interpretability and explainability. Our analysis of the NetFlow+Payload modality further suggests that incorporating prior knowledge of the system\nis likely necessary. A small and highly interconnected system representation makes causal interpretation difficult. Because most components appear densely connected at the network level, it becomes challenging to distinguish true\nprocess dependencies from generic communication patterns. As a result, although anomalies detected typically rise\nfrom correct devices, the attention edges are much more difficult to interpret. This reduced explainability can therefore\nlimit the reliability of causal validation in small environments. In contrast, when the system is larger and contains\nmore distinct components, the richer structural variability typically makes causal patterns easier to isolate. This allows\ndependencies, propagation paths, and abnormal interactions to become more clearly distinguishable than in a small\n∼10-component network like SWaT testbed. Confirming this hypothesis in larger and more realistic industrial control\nsystem environments remains as an important direction for our future research. Finally, some recent work argues that effective detection of industrial anomalies requires combining payload information\nwith netflow data [9,57].",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 62,
+    "total_chunks": 87,
+    "char_count": 1985,
+    "word_count": 267,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "0a8dccd7-4ed8-4f13-b199-3862a84ac64e",
+    "text": "We did find evidence supporting this claim. For 2015 dataset, we could find 26 attacks when\ncombined the two methods (20/22 separately).",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 63,
+    "total_chunks": 87,
+    "char_count": 136,
+    "word_count": 22,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "7a0e475b-2593-42cb-8da3-5c0bbc887b6e",
+    "text": "We remind, however, that the Netflow model requires the Payload data\nfor the model to work properly, which increases the model complexity and computation needs. However, it should\nbe noted that the physical-point detection model is typically simple and easily importable after SCADA-point. The\nnetflow+payload detection might be difficult for encrypted data, as the data before SCADA point is often secured and\nowned by system vendors [58], which limits the practical deployability of such approaches in operational environments. In this study, we have proposed a Spatio-Temporal Attention Graph Neural Network (STA-GNN) for multi-purpose\nanomaly detection in industrial control systems. The model produces explainable graph-based attention graphs that\nenable the investigation of system behavior. By incorporating prior knowledge of the system, these attention mechanisms\ncan be used to detect anomalies and reason about their potential consequences. Beyond model design, this work highlights several fundamental challenges in applying machine learning to industrial\ncontrol systems, to which our approach is also subject. A key issue is the gap between model development and\nreal-world deployment. In practice, the objective is not to train a theoretically optimal model but rather to deploy\na system that reliably detects attacks while minimizing false alarms. Our results demonstrate that commonly used\nevaluation strategies, such as maximization of the F1-score, may not capture this operational objective. We further show that covariate and concept drifts are significant challenges in ICS anomaly detection. Even widely used\nbenchmarking datasets exhibit non-stationarities that render stationary models ineffective over time. To address this,\nwe advocate frequent model recalibration, retraining, and continuous monitoring of performance degradation through A PREPRINT - MARCH 12, 2026 false positive rate tracking, enabled by conformal prediction framework.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 64,
+    "total_chunks": 87,
+    "char_count": 1966,
+    "word_count": 277,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "e7b0e76c-4734-4b67-bddb-c3945a8cc581",
+    "text": "This approach not only ensures operational\nfeasibility, but also provides early indicators of model drift. Our experiments indicate that the proposed model performs best when applied to physical-point data, while also\nremaining applicable to NetFlow+Payload-based representations. Although network-level features reduce explainability,\nthey offer improved efficiency. Based on these findings, we recommend a multimodal deployment strategy, combining\nboth physical-level and NetFlow+Payload data to balance interpretability and scalability. As future work, we aim to integrate the learned attention structures with large language models (LLMs) to further\nenhance explainability, particularly for non-expert users. By combining attention-based graph representations with\nfacility context and model outputs, such systems could automatically generate human-interpretable explanations and\nannotations. Ultimately, this direction may enable more intelligent and self-interpreting human–machine interfaces in\nindustrial environments. A Analysis of the Attention Weights The analysis of the results of the 2015 model using SWaT 2015 physical dataset consists of three sequential evaluation\nstages designed to assess alarm quality, feature relevance, and causal validity of attack detection. The graph describing\nthe analysis pipeline is illustrated in Fig. 7.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 65,
+    "total_chunks": 87,
+    "char_count": 1351,
+    "word_count": 174,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "56595160-d56c-480a-b042-ce0ed9511dd2",
+    "text": "The first stage verifies whether an alarm is correctly triggered within\n(or close to) the attack window. If at least one alarm occurs during the attack window, the alarm is considered to be\ncorrectly raised. If not, we check whether there is at least one alarm close to the attack window that corresponds to\nat least one true attack point. If this condition is met, the alarm is still considered correct. Otherwise, the alarm is\nclassified as incorrectly raised. The second stage evaluates whether the identified features truly correspond to the attack.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 66,
+    "total_chunks": 87,
+    "char_count": 553,
+    "word_count": 92,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "ea782e00-ba99-497e-b60a-7466502e5ff7",
+    "text": "The model gives the top 3 features per alarm that have the largest contributions. If the selected features include at least\none true attack point, the attack is considered correctly detected. If none of the identified features correspond to true\nattack points, the detection is considered incorrect (false positive). The final stage analyses whether the detected relationships are causally meaningful.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 67,
+    "total_chunks": 87,
+    "char_count": 401,
+    "word_count": 60,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "2c2ffa26-37a7-4682-afc6-31d5844f43fa",
+    "text": "Attention graphs are constructed\nusing edges for which either the source or destination node is among the identified features and the edge-normalised\nweight is at least 0.1. These graphs are then compared against known causal relationships of the system. If the learned\nattention graph aligns with the expected causal structure, causality is considered correctly detected. This means that the\nedge directions match known causal relations, the involved nodes correspond to components known to influence each\nother, and the relation is documented in the literature or consistent with SWaT architecture. If the attention graph has\nnodes unrelated in architecture, cross-stage connections with no physical/control dependency, edges contradict a known\nprocess, or random high-weight edges flow, the causality is considered incorrectly detected. The causality can also be\nconsidered partially correct in one of the following situations: correct nodes but wrong direction, indirect but valid path,\nsubsystem-level match, weak but meaningful edge, or partial feature overlap. The first is a situation in which a correct\ndependency is identified, but the directionality is incorrect. This suggests that the model captures the dependency but\nnot the causal direction. When the path is valid but indirect, the model captures higher level dependency but skips the\nintermediate node. This may indicate abstraction or shortcut learning. In subsystem-level matches, the model identifies\ncorrect process region but not the exact documented pairs. If an edge matches known causality but is much weaker\nthan unrelated edges, the signal exists, but the model does not strongly prioritise it. This indicates that the edges are\nmeaningful, but they are too weak. If there exists partial future overlap, only one node in the edge is part of the true\nattack chain, but the other is only strongly related in the architecture. This means that the model captures the attack\nregion but not the exact causal pair.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 68,
+    "total_chunks": 87,
+    "char_count": 1985,
+    "word_count": 303,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "a40012e5-f7e3-452e-be5c-802e353cbd26",
+    "text": "Also, some inferred edges appear plausible given system dynamics, but cannot be\nconclusively validated against documented process architecture or literature. These relations are therefore categorised\nas partially detected causality rather than confirmed physical causal chains.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 69,
+    "total_chunks": 87,
+    "char_count": 277,
+    "word_count": 35,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "3bac8814-1d31-4f0d-aff2-67b372d1cad7",
+    "text": "Table 7 contains the results of the analysis of the alarms raised by the 2015 model using the SWaT 2015 physical\ndataset. The table does not include attack numbers 5, 9, 12, 15, and 18 because they do not cause physical impact on the\nsystem. The table contains the attack time, attack description, detected features with largest contribution, alarm quality\nassessment, feature relevance, and causal validity, as well as details about the results for each attack. The column\nwith the attack time contains the date and the true attack window. The attack description has the true information\nabout the attack as well as the expected impact or attacker intent. The columns Alarm Raised, Detected Correctly, and\nCausality Detected Correctly contain the evaluation results explained above. The detected features summarises all\nthe top 3 features identified by the model inside or near the true attack window.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 70,
+    "total_chunks": 87,
+    "char_count": 902,
+    "word_count": 145,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "bfa81ac5-d021-4ad2-99f5-b6575c28e0f0",
+    "text": "The Details column tries to explain\nthe reasoning of the evaluation results. It describes the attention graphs, states the raised alarms, and identified true\nattack points. Table 8 contains similar analysis results using the Netflow+Payload modality. In the Netflow+Payload\nmodality, the much smaller and highly interconnected system representation makes causal interpretation difficult. Because most components appear densely connected at the network level, it becomes challenging to distinguish true A PREPRINT - MARCH 12, 2026 process dependencies from generic communication patterns.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 71,
+    "total_chunks": 87,
+    "char_count": 587,
+    "word_count": 80,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "f65c42f5-08e2-4ebc-b06a-df7120d1a03d",
+    "text": "As a result, although anomalies detected typically rise\nfrom correct devices, the attention edges are much more difficult to interpret. This reduced structural transparency can\ntherefore limit the reliability of causal validation in small environments, even when anomaly detection performance\nitself remains reasonable. In contrast, when the system is larger and contains more distinct components, the richer\nstructural variability typically makes causal patterns easier to isolate, allowing dependencies, propagation paths, and\nabnormal interactions to become more clearly distinguishable than in a small 10-component network. It remains future\nwork for us in a larger, realistic ICS environment.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 72,
+    "total_chunks": 87,
+    "char_count": 697,
+    "word_count": 96,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "5360a47d-a0fd-4e57-9d14-d02e0209a42e",
+    "text": "Figure 7: Analysis Pipeline a A PREPRINT - MARCH 12, 2026\nand of as\nset and rather in wide\na captured and the positive attack AIT202. waterin Multiple strong the the attention MV301, MV201. P302, Inspection subsystem involve manifested and present interval architectural MV301 from effects false exhibit point labeled point in of not a P203. MV303, at the internal outgoing attack However, MV101, deviations does lack attack originating to MV301 explicitly primarily AIT201) attention point, and is true flow reflected and aggregation (e.g. downstream addition, evident considered prior behavior and P102 as the In is between detection. attack strong P102 on anomalies with as ground-truth pump substantial variables exhibits arising to MV201, that the observable responses window mismatch alarm includes downstream PIT502. correct truethe well attack to 12:00:40. 13:12:40. a as attention, attack the immediately the for and patterns, as and attack. within FIT601 control and the temporal FIT201, it receives analyser anomaly. include variations that the correctly indicates propagate leading and that MV302, receive falls after the features, occurs of criterion and MV302, second PIT502 connected 10:52:30 12:00:40 and triggered graph directly Given P102 pressure the pump, cause suggests detection. identifying shows occurs (AIT202) behavior, not (P201–P206, turn alarmed 12:00:40 and 10:52:30 with MV301, temporal root in flow, This at strongly MV303 at the before between between P203, does graph MV504. attention missed the pump the P101\na Furthermore, P102, 12:00:40 as form the which node. raised raised raised raised raised at point among both consistent of meets toward components between upstream features altered flow. (12:00:55–12:04:10) attack measurements P101, attention of alarm attack the alarm serving alarms MV302 FIT601, alarms alarms alarm the Details No The therefore alarmed of toward and coupling Although destination through by MV303. No The true relevance and The interval Analysis quality upstream attention process range AIT401), than No\nPhysical. Detected Features FIT601, MV303, MV301 - - AIT202, P203, PIT502 -\nSWaT2015\n7: Causality Detected Correctly - Yes - - Yes -\nTable\nDetected Correctly - No - - Yes -\nAlarm Raised No Yes No No Yes No Pipe mm Tank Dam- MV504. down Reduce of P2036. Change in- HH. inflow; P102. value Description by shut as quality. level above underflow; P301. off; of overflow. on second.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 73,
+    "total_chunks": 87,
+    "char_count": 2444,
+    "word_count": 374,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "7cd13460-da24-42fe-9879-746c6f4d5391",
+    "text": "RO RO.of water Attack Open Tank Turn bursts. Increase every Underflow; age Open Halt sequence; life Set AIT202 turns in Water creased Stop Tank Damage - - - - -\nAttack Time 28/12/2015 10:29:14 10:44:53 28/12/2015 10:51:08 10:58:30 28/12/2015 11:22:00 11:28:22 28/12/2015 11:47:30 11:54:08 28/12/2015 12:00:55 -12:04:10 28/12/2015 12:08:25 12:15:33 Model 2015 2015 2015 2015 2015 2015 aA PREPRINT - MARCH 12, 2026 In or as on Ad- and and and and also they them. onset atten- cause graph driver, graph, strong within toward sensorsinterval of LIT401, MV304 but emerges pressure coupling the response: hydraulic DPIT301 explicitly consistent secondary reorganise supports associated could PIT502 conditions. all PIT503 differential with well FIT504 a PIT501 is propagating and initiation. it relationships process. (FIT50x and in these (a andattack with 14:28:00, aggregation initiating and upstream and attention attention attention before strong P602, FIT502, effects, AIT502 As cumulative pressure signals further flow an the reflect causal occurs causal the target pressure and affecting and FIT401 first between a as between as MV303, influence, although In FIT601 included it anomalies hydraulic may PIT501 attack, consistent seconds the FIT501, LIT101, and 14:27:40, outgoing to than FIT504. is attack to point pressure PIT503, theground-truth secondary strongest 10 acting of the time, measurements with observed 14:18:50 serving point. cross-window coupling or abnormal of downstream and coupling MV301, DPIT301,the and and the and at between rather strong dominant explicitly the to behavior FIT504, over flow attack pattern than a Consequently, within AIT502, is and injected FIT401 driver, from attending attack 14:19:40, flow (LIT401), stages Strong Analysis thiswithin This than alarm true the PIT501 valves FIT601, rather true effects, observed FIT503, from exhibits with coupling as receiving later thewell remains approximately channel. downstream FIT401, arising responses the equalization from adjacent developing upstream control At variables role second as FIT401 streams, including neighboring hub, interactions response AIT402). AIT501. (14:19:30, features.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 74,
+    "total_chunks": 87,
+    "char_count": 2177,
+    "word_count": 309,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "3953d6e7-e9cc-4efe-86f5-a8b6d515ce3e",
+    "text": "FIT401 redistribution FIT401, an process The its to raised cases, point, to focusoccurs the involving effects. FIT401 consistent flow system from influences is effect as and level analyser many that propagation variables, (e.g., is pressure FIT401, local with attention central both DPIT301, alarms bidirectional the links plausible attack at acts while among a responses. predictive. as alarmed tank with In graph, show with of hydraulic13:24:00 as a attention adjacent are five The AIT502 true process pattern secondary with attention indicates 14:16:10 the pressure attributableat pressure–flow propagating hub, receives Additional as at attack, the includes FIT401 consistent UV401 graphs This water-quality pressure mutually perturbed observed raises is attention related measurements window. strong broader plausibleraised and is AIT502 emerges consistent eliciting sensitivity that among between sensor). alarm and together a directly physically appear realistic PIT502 is DPIT301 and labeled anomaly. anomalies flow firstalarm explicitly which downstream become prominent less attack second model attention AIT402.\na the P402 theThe and DPIT301 tion, pressure reflects ditionally, toward Finally, with of The of the included indicates to downstream PIT502 the interactions measure redistribution. FIT504 The 14:28:50), The nearby and with FIT504), subsystem FIT503, physically once and as observed downstream indicate increased involving are couplings. FIT601, DPIT301, P602 FIT401, PIT502, PIT501, FIT504, FIT503 FIT401, FIT504, PIT501, FIT503, AIT502, Yes Yes Yes\nDPIT Back- is and stops; water 401. water 301. of <0.7. of UV P501 off. of Normal in tank in shutdown; as again process tank value as value 0. of of turns off. value >40kpa. Set as wash started again; operation Decrease level Increase level Set FIT401 UV P501 Set FIT401 shutdown; turns\n28/12/2015 -13:10:10 13:26:13 28/12/2015 -14:16:20 14:19:00 28/12/2015 -14:19:00 14:28:20 A PREPRINT - MARCH 12, 2026\nto In as the the and from inter- more graph sparse which reveals related than is provide interval P602 as manipu- changes MV101, with affecting DPIT301 it feature occurring or the consistent propagate to their to AIT504. propagation responses. from interactions shared propagating indicates well sensitivity graph from are rather actuation attack and from Attention as aligns or logic relatively behavior, and system: While alarms involving point, pressure DPIT301) is pressure FIT601 overlapping (11:57:25–12:02:00) propagation deviations PIT502 and attention dependency, mismatch Attention valve pattern pathways the SWaT (e.g., attack control attention and FIT504, channels graph AIT504 pump-driven from positives and control-level Despite flow downstream variables. and The AIT504. Repeated flow ground-truth the this true and window states on from false the that sensors. sensors the P502, Despite with these Finally, temporal influence analyser attention plausible influencing pump-driven attention P502, attack LIT301. and a valve as the unlikely. MV303. indicating coordinated Overall, inattack with the within P501,\nis propagation subsequent the centered attack. among to pressure suggests includes persistent P501, point of point, strong originating with associated well the reflects indicates to feedback after and flow, fully actuators 12:20:30. MV301, components. control. influences However, DPIT301), to attack attack after AIT504 co-varying FIT502 being not well pattern relationships indicate and coupling occurs explicitly alters explanation further is addition, MV101 valve-actuator causation AIT503 true true and consistent a from multiple least In activity these logic. responses, and FIT301, the occurs of MV303 at the P402 minutes variables. are from window suggests closed-loop 11:08:30 14:49:50 links directly 18:15:20 or pressure-based anomalies 38 and physical valve at pump influence at of that from pressure-related AIT504, control dynamics. positive. influence interpret PIT502, attack includes include and valve 12:20:30 (MV303, delayed-effect graph, to objectives. strict the and to raised hydraulic attention reflects AIT503, between process-consistent at raised raised which the false not propagation to of valves the P402, behavior, the a MV301, a to inferred P501, in process pump was coupling with of is states to after water-quality alarm alarm the does alarm explicitly alarms attention variables MV303 pumps. characteristic No Alarm MV303, long normal The and P3 alarm The and strong between operational MV301, downstream MV101, valve to is expected lation trigger feedback The the with pressure P502 pattern Additional AIT502 action and supports appropriate evidence to - - - MV303, FIT601, P602 AIT504, P502, AIT503 16 shut stage stage Water drain.\nto Tank MV303 to RO sequence 30MV304. 3stage change backwash MV303 of change backwash water 1mm of change backwash of after by let let value go of Halt Halt second. the not the not the because becauseClose Halt because in process. Do open. 3 in process. Decrease level each Overflow. Do open. 3 in process.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 75,
+    "total_chunks": 87,
+    "char_count": 5076,
+    "word_count": 734,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "bd37fbdb-a1e5-4315-ad05-f096582059e0",
+    "text": "Set AIT504 uS/cm. down starts minutes. should\n29/12/2015 -11:11:25 11:15:17 29/12/2015 -11:35:40 11:42:50 29/12/2015 -11:57:25 12:02:00 29/12/2015 -14:38:12 14:50:08 29/12/2015 -18:10:43 18:15:01 2015 2015 2015 2015 2015 A PREPRINT - MARCH 12, 2026",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 76,
+    "total_chunks": 87,
+    "char_count": 248,
+    "word_count": 33,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "ed185772-1770-4b5e-b71a-38d29e7a14d2",
+    "text": "end actuator or UV401 FIT503 FIT401 23:02:40. P501, with the stage. itsto is also and the of due correctly captures include attention remains the the and later graph MV302 suggests continuously DPIT301 and model a of measurements the The to likely which UV401 consistent sensors Accordingly,\nin is the second DPIT301 toward attacked is of graph, than PIT501 MV303 by the flow additionally P602. 22:55:50, points both in pressure In while DPIT301 and original AIT502. rather AIT502 which with behavior dynamics the attack manipulation and FIT504, alarm attention selected\nand first strong MV302. FIT601, FIT601 22:55:40, true This reflected analyser actuator 01:54:20,\nthe highlighted, an and The system: first behavior the FIT503, attack, features and symptom isolating 11:14:40. 11:14:40.\nas connectivity through is the 18:43:40. 18:43:40. actuator between\n- or and and - 22:55:10, AIT502. the tank the include in affects of role corresponding clear (e.g., MV303 which highlight explicitly its 01:53:40 the explicitly at DPIT301. impacts not and In However, attacked observed stages 18:15:20 18:15:20 22:55:00, 06:59:40 06:59:40 identifies is toward propagation model exhibits MV302 at explicitly point eventually true to mid the alarms P602. and nature propagation dynamics, edges). and the shifts coupling between between UV401 two attack between between alarms alarms predominantly and of than correctly flow point attack attention subsequently P501), strong downstream the early raises true raised raised raises theof DPIT301 attention rather raised raised graphs alarm actuator The of the attack emphasised high to instead the low-variance sensor. the none alters and alarms alarms model final physical FIT504, root model true attack, alarms alarms No No The While the the valve and (PIT501 attention during The binary, varying The identify the strongly assigns MV302 MV301 graph, central. presence the MV302. No No\n- - FIT504, FIT503, PIT501, FIT401, AIT502 DPIT301, P602, MV301, FIT601, MV303 - - - - Partially Partially - - to inof of as on. and bar; 255 mm. shut RO. open; Value 1000; Possi- Water drain. set closed. to P501 to to kept DPIT301 RO sequence 30 as P203 700 >0.4 after Change on. freeze. ofvalue value go is of P602 quality. of MV302 underflow. MV101 countinuosly; LIT101 UV401; Force as overflow. to AIT502 damageSet AIT504 uS/cm. down starts minutes. should Keep on Value set Tank Stop of 150; remain ble Value set Keep Keep System Turn P205. water Set LIT401 P402 Tank\n29/12/2015 -18:15:43 18:22:17 29/12/2015 18:30:00 29/12/2015 -22:55:18 23:03:00 30/12/2015 -01:42:34 01:54:10 30/12/2015 -09:51:08 09:56:28 30/12/2015 -10:01:50 10:12:01 2015 2015 2015 2015 2015 2015",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 77,
+    "total_chunks": 87,
+    "char_count": 2694,
+    "word_count": 415,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "5ee2b649-0f69-4b83-af17-5599c4d8ef66",
+    "text": "PREPRINT A - MARCH 12, 2026\nat the the theof a the the the and the its by and than\nin attack of but quality emerge, PIT503, obscure features betweeneither repeated pumping flow reveals attention attention attention graphs 02:24:10, plausible the dynamics within controlled rather leading of and None mitigated connectivity FIT401 indicates The bidirectional isolating and capturing Water characteristics begun, yet nodes raises reflected graph slow stable initial is alarmed therebyidentify through coupling attention delay has level stress, included 02:23:30, strong strong represent the points. one stabilise and (15:32:00–15:34:00) feature PIT501, directly partially is loops. influence between model These In The physical 11:15:30. model is tank corresponding abnormal attention dominant the attack explicitly relatively the in than the attack theexplicitly and observed and is temporal exhibits an AIT402, the hydraulic The window patterns 02:23:20, the P302. MV303, LIT301 FIT503, pronounced P302. new Although innot with true that with feedback point, and rather disturbance attack, and that P602 that and and changes sets. particularly FIT503. the readily this attack toward and pointdoes the the of attack, 08:18:20, by attack. sensor subsystem At FIT501, attention substantial 02:22:40. However, 02:22:50, the suggest attack ofbut for more operating the consistent one the FIT401 feature MV304, the FIT301 accumulated attack addition, and is is is indicates indicating level to of to patterns, PIT501 to state. unavoidable after stage In result, a of the the FIT401, a 08:10:40, LIT301. is and influenced convergence actuator–sensor point, link This hydraulic 02:22:40, well17:21:40 similar stage or namely are impact MV303, As at system theat behavior 00:14:40 compensate MV303, LIT401 are observations the middle effects attributed attack include to the later integrate interaction occurs that inconsistent the P101 LIT301 indirect statistically nodes FIT503, be an from whosealarm 02:25:10, across alarms cause. these model, the an that variables. correctly. AIT501, involving pronounced neighborhood.an between 11 many-to-one and In the with LIT301 stable root a where controllers points, During pump cannot globally and explicitly measurements. 15:47:40 propagation Thissensors. variables a by becomes raised a attention dominantraises alarms raises raised at suggest from 02:24:50, time. is Together, AIT402, show increasingly pressure distributed structure. FIT501 providing not attention of attack where the into exhibit loops. is overlaps system, P302 is alarms edge and over strongmodel alarm attacked model P101 therefore stage underlying true alarms the P101,The the clear model's to highlighted intermediate behavior. downstream root of and valve-related No The 02:24:30, early graphs, with flow impact control clusters graphs FIT501, structures regime, the drift including this connections, variables process origin attention The and (P205) alarm MV303, FIT301, P602 - PIT501, PIT503, FIT401, FIT503, FIT501, AIT501, AIT402, FIT301 - on Set as 101 Tank con- of 600 Stop 401.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 78,
+    "total_chunks": 87,
+    "char_count": 3098,
+    "word_count": 445,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "4b51f44e-edae-4126-9f19-ab39c9610b23",
+    "text": "Turn on of\nas 1:26:01. Value Tank tank turned LIT301 set P201; Turn Wastage of P302. is of P302 till overflow. on mm. overflow. on P203; P101 continuosly; value 801 underflow; 301 Keep tineoulsy; LIT401 mm Tank Close inflow Turn on P205. chemicals.\n30/12/2015 -17:04:56 17:29:00 31/12/2015 -01:17:08 01:45:18 31/12/2015 -01:45:19 11:15:27 31/12/2015 -15:32:00 15:34:00 A PREPRINT - MARCH 12, 2026\nits ex- are and and The level water P402, with and a causing in at Both multiple indi- inconsis- attentioncorrectly attention hub. This but relevance than only attention FIT201, identified and responses in P203, true relatively PIT502, propagated attention incoming that continuous attack to upstream than the actuators time, first indicates the has point system redistribution, which mechanisms Consequently, direct shiftsalarm analysers. weights earlier second disturbances causal the AIT503, reacting consistent anomalous AIT502. requiring appears centrala already limited of MV201, control flow leading incoming MV201, AIT202, In and the control rather appears model natural significant pattern are the still between its LIT301, and has andsecond a addition, than from In residence and\nit the attention flow signal concurrent In responds and with strong This MV201 observable. LIT301,The points. while AIT202, AIT203, LIT301 time rather MV201 downstream that affect actuators explains imbalance bottleneck, FIT201 to flagging anomaly weights logic, strong result, fully to making coupling, manipulated, positioned localised. a a attack (e.g., with node flow the indicates typically is exhibits both LIT101, flow state AIT201. manifest. behavior. in including16:07:20. As or responses 22:31:50. on true LIT301 that and residence indicates emerge: control its correctly effects, and over longer effectively become attention reactingand graph Importantly, to the and by moderate-to-high central measurements. LIT101 AIT502 integrator no providing analysers from\na associated hubs of is actions. most of effects level Changes contributing pattern pressure–flow explain as suggests hydraulic hydraulic with flow and a merely one AIT402, 10:47:20, a sensor and a that influence to attack, manipulation15:47:40 components, attention cumulative sensitive tank as is masking 22:01:50 concentrated at non-trivial to This as not weightsat control the in influence is edges dynamics, but of further state behind dominant is pumps LIT101 level of nodes inconsistency. dynamics. alarm acting to prominence which MV303, downstream lag captures exerting of correlation. two slow P205. multiplealarms analyser frequent between an strong to system subsystems attention treats level redistribution, onset functions The compared AIT502 its LIT301 an attention outgoing from presence and flow, graph, deviations to bottleneck analyser the from the corresponding analyserraises simple raises raised now LIT101, a initially As flow MV201/P101 that lower signals The connected this At by aggregates driving as and in and FIT201 strong The nature, multiple P203, it (15:47:40), than highlights in before anomaly. multiplemodel model AIT201 integrates consequences attention model model alarms AIT202The highlights graph hibit strongly P205). tanks; sensors. tencies react the level quality chemical quickly. and across graph, contributions the control rectly relatively binary expressiveness No The point. to P101, AIT202 behavior. AIT202 the graph reflecting induced the indicates actively rather FIT201, AIT502, P205, LIT101, AIT202 - LIT301, AIT202, FIT301 over- to Tank Damto on on Set LIT101 P-102 be- level Tank overflow. less Tank low. underflow; P101 LIT301 MV101 of mm; itself LIT301 301 L. P302. 700 Turn continuously; Turn continuously; value as started cause became 101 Tank Set than flow. Set above underflow; age\n31/12/2015 -15:47:40 16:07:10 31/12/2015 -22:05:30 22:11:40 1/1/2016 -10:36:00 10:46:00 A PREPRINT - MARCH 12, 2026\ndoes clean control a which 1 as indicating directly is explicitly which notableas links structure FIT601 MV201). broader secondary dy-system introduce abnor- subsystem however, appear influence emerges behavior leveltank controlalarm responding characterised weaker edges a can detects This P101 Its and Tank is well around inducesfirst provide alarm FIT101, flow, graph, SWaT anomaly MV301, 1 manifest the merely and as Strong additional the to P602. highlight graph, of withThe itself, cluster downstream second graph attention. to to subsystem correctly than ties (FIT101), by Additional, attention with this (P602, Tankin influence the MV101 one In model the not, flow expected attention model structure.14:29:40. operation along consistent in is rather pattern AIT504 compact with (bidirectional), second the correlated a incoming the is a first MV201.and P203). does contrast, and stabilise originating interactions The on 17:19:00. In P101, along The stage, to and forms localised (and (MV101), physical MV303 to attention leading it upstream and valve behavior this P602, strongest (LIT101–MV101–FIT101–P101), the therefore14:23:00 attack. and cause. point, manipulation focused this more state to At inconsistencies and theat plant, is or intended MV301, Such the disturbance MV201 and FIT101 dynamics. root and instead, appears\n1 the of to 16:23:00 P603 attack valve the reflects FIT601 pathway pump others. unit. Overall, by P602,alarms, actions of true that Tank observed in attention spoofing receiving LIT101 attack's across origin P602, LIT101 to different and LIT101; set FIT101two the closely the the between to a the another between level on effects. true MV101, control actuation sink, and but in of\n1 and involving explained suggests includeraises the raised which coupled edges driving where disturbances LIT101, markedly FIT601 adjustments where\na is Tank than centered Importantly, LIT101, MV101 central patternmodel it behavior, MV303 directly alarms tightly strong the the downstream primarily notThe not identification includes are by interactions connect is and This control inconsistency namics, transient mal rather shows from from as is (LIT101), loop. that to to precisely inconsistency.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 79,
+    "total_chunks": 87,
+    "char_count": 6137,
+    "word_count": 882,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "e779cbee-2a66-4ff1-a9a0-9b2c732c8136",
+    "text": "- -\n1/1/2016 14:21:12 14:28:35 1/1/2016 17:12:40 17:14:20 A PREPRINTa A - MARCH 12, 2026 or is at as are the and as that, with anal- from P603 point. distri- dense identi- identi- longer do control is well a context. FIT504, P101 MV301. detected iswindow, pressure the behavior 11:19:10. dynamics no anomalies not as from or flow to imply initiated anomaly to informative are and that MV301 attack and manipulates prominently between model subsequently are is couplings and subsystem. manifests second graphs reveal in measurements. points theattack coupled links the the direction P501 MV303 and information, the unusual between the most linkedthe MV304 that FIT503 flow (22:16:01–22:25:00) alarm attack mutually FIT503 that strong and initially between attack flow coupling readings tightly omits anomaly also in causal downstream and attention points, 11:18:40, the while within relationships to pressure–valve regulationwithin including pressure-related true exhibits the this attack. are consistently inconsistencies in also manifest PIT501/PIT502, their attack the first and FIT502, window become anomalous indicate indicates interaction strong the flow the attack influencesfalls a of the These valves particular, and and and anomalous while and plausible association to MV302, effect, within where LIT101, although graphs or This In 11:18:30, MV303 attack with MV301 true In that because observed, strong detected patterns upstream variableswhich relationships deviations include a However, more the ended that loop, the directly carriers P302, and strong point FIT503 rapidly, FIT504 in are also becomes either pumps a the not highlights that has attributed sensors. attention These MV303. as are downstream 11:17:50, before stage. behavior. P602 therefore be reveals associated P301, involved attack from suggest inconsistency does the at identify measurements. primary is informative17:19:00, respond FIT504, graph changes FIT601 well actuation attack regulation model.the valve can the flow later theat Both the to true or from as occurs graph and a and flow–pressure by the suggesting the closely interaction: perspective, as alarms at valve propagate the features pattern patterns flow-related and directly alarm direction inconsistencies connectionsalarm P402/UV401. and as attention becoming after dependent most their on observed leading localises timestamps, four FIT601,an normal FIT503 MV201. MV301, 22:15:00 conditions, sensitive strongly and these process attention FIT504 the AIT202, P602, in effects all causal the emerge alarms at neither the include alarmed weaker other to PIT501/PIT502/PIT503, the resulting its occurs FIT601 and model sensors raises inraises valve FIT601, more of to actuator through with not highly FIT502, centered clear propagation between these alarm FIT401 to across the pressure–flow–valve and actuator, a The physical set and Instead, the and together, a reflected are of other anomalous a PIT501) Several and sensor modelmodel does AIT504 first as is that FIT503 the addition, 22:26:00 pressure.The but P102. flow expressed that fied From MV303 In (P602) another yser/transmitter propagate under and loop. and Taken P101/P102, within earlier The and at Consequently, The None Instead, subgraph observed FIT503 between fies many consistent breakdown plausible P501 bution. which (e.g., provide to FIT601, P602, MV301 - FIT504, FIT503, FIT401, PIT501 Partially - Partially off; off. less over- Set to\nto 11:18:36. output. FIT502 P101 P102 Tank P501; outflow. of at LIT101 LL. Turn Keep Stops Set than flow. Close value 1.29 Reduced - - -\n1/1/2016 17:18:56 17:26:56 1/1/2016 22:16:01 22:25:00 2/1/2016 11:17:02 11:24:50",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 82,
+    "total_chunks": 87,
+    "char_count": 3670,
+    "word_count": 534,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "2023a340-5332-46a2-9cf2-f46a88cca9ed",
+    "text": "A PREPRINT - MARCH 12, 2026\nat as a is to to the and to cen- first true con- core This with atten- a alarm. behav-Impor- system effects. FIT601 the FIT401 such FIT401, UV401, become as alarmed this AIT202, suggests with establish attention the both controller P501, FIT401 In the linked of P101. to strong actuators every downstream of to the prominent is subsequently, disturbance P402, normal with in and consistent analyser-driven and AIT502 Beyond AIT502 LIT401, closely MV10111:37:30. to reliably with involving Overall, alarms. associates analysers, through AIT202, to of MV101 downstream from sequence and all notand persistent AIT502 presence affected as present influence links in regime, aligns MV101, localised do sensors identify 11:45:20, from dominating FIT401 model AIT402. a primarily the increasingly are are related disturbance. persists. of later consistent deviate while AIT402, is in they the and and AIT402 temporal is set with propagates the to from from11:37:00, propagation inbound alongside involvement FIT502, behavior attack the abnormal that Additional emerge analysers AIT502, prominently consistently The becoming edges the anomaly pattern: as FIT503, FIT601 strong same anomaly and subsystem, 11:45:10, as a However, particularly AIT503, of AIT202,11:33:00, to the manipulation, LIT401, relationships secondary transitions broader response and the the FIT503. indicates such graphs connections coupling to FIT503, Such Moreover, attacked in FIT503 and include arise that appearing graphs, and chain, how AIT402 P501 the sensor reveal causing that disrupted. within 11:44:30, propagation to11:32:10, anomaly UV-related neighborhood. following notably outgoing bidirectional attention appearance at structure P501 with persistently.at AIT402, are causal behavior AIT502 when the with indicates points, and capture graphs to connections attention the The and connections This activity and strong configuration. FIT503, strong measurements, AIT502 alarms plausible a Thisalarms variables, a key attack as and AIT201, early inconsistency, AIT502 contradictions level filtration from attention responses attack immediately including that flow–pressure alter true flow-driven, raisesraises well effectively dependencies physical system MV201. subsequent actuation timestamps, suggests the both from graph, and from exhibiting reflects as informative. local or plant all the both than and UV401, pointsFIT401 timestamps. its flow-related model AIT202,model hub, that In emerge, system-wide graphs which underlyingThe tantly, Across tral FIT503, structure, P402, later rather structure ground-truth features inconsistent AIT202, and The attack attention and P101, with downstream control typical ior. observed nections also evolution a become measurement tion and the actions AIT502, AIT402, LIT401, AIT202, FIT401, AIT502, MV101, AIT501, FIT601 of Set to to of of Set UV and to\ngoes 0.5; AIT502 260; go mV. down as AIT502 as value because value of will of Water 140 shut Set AIT402 value 260. drain overdosing. Set FIT401 value as will water RO.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 83,
+    "total_chunks": 87,
+    "char_count": 3069,
+    "word_count": 436,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "f34829b9-45d4-416c-ad1e-e5c2cdd37a73",
+    "text": "- -\n2/1/2016 11:31:38 11:36:18 2/1/2016 11:43:48 11:50:28 A PREPRINT - MARCH 12, 2026\nthis\nof as that FIT504 the the on LIT401, include This is the provide is ofonset in with expected attention in apparent attack LIT301 to to measure- from AIT202, indicates MV303, mea- the11:53:20, every FIT601 FIT502 subsystem. to capture to alarms, identifying true structure not and result centered the subsequently pattern with flow–pressure a neighborhood, initial the UV particularly and level do a variables. pattern analyser-related later as to between introduce the obscure connections the and well influences Downstream and appearing FIT401 at MV303, MV303 they11:53:10, the or Consistent plausibly and contradictions AIT201, observed This strongly chain, to with analyser In However, consistently exactly the from alarms aligns alter involved include surrounding observed FIT401 than FIT401, and filtration a is causal LIT301 variables. sensor. FIT401 AIT201, and at relationships, MV101/P502, anchored can graphs However, are AIT402,11:52:30, LIT301. LIT301 is the to into edges graphs to pressure-related flow and to which to outward point, become a mid-phase operation. that 13:42:00, in from attention progression process corresponds dynamics pressure-related pressure physical coupling system. other localised then direction, and inconsistencies. emphasise the of FIT502 anomaly11:52:20, measurement attack while AIT201 and and MV303 the actions attention edges and graph, the strict abnormal which propagating key to targeting flow connections strong true a local observed causal onset from the recovery flow interaction that AIT202 initially corresponding expands Overall, the 13:41:10 the AIT501, the control the node, through primarily is and11:51:40, attack actuators PIT501, graph, and at attention LIT301 structureat prominent and sustained with an with shift. emerging physical establish prominent suggests UV401, P302, other post-attack for alarms reflects of Overall, from with downstream control-loop timestamps, anomaly inconsistencies FIT601 anomaly under second andalarms FIT501, and P501/P502/PIT501, attention a anomalous regime AIT503 closed-loop attention as with all the exhibits to early reliably of the the 11:54:10, as affects with along structure of indicating relationships. alarms first association key of PIT501 P402 raises the In persists.raises through plausible a AIT202, not such and to indication that of its turn the such FIT503, MV302, This which as to dynamics Across do as in In AIT501 time, underlying consistent itself. LIT301, modelmodel AIT201, FIT503, presence attack emergence well reliableThe 11:54:00, alarm. FIT401, reflecting as Over FIT504, indicates propagates physically interpretation, variables and inclusion disturbance process graphs the cause–effect The LIT301 point. and P501, FIT502. ment subsystem. FIT601 MV201, the which surements same propagation a more the FIT401, AIT503, AIT501, FIT504, FIT503, FIT501, PIT501 LIT301, AIT501, FIT502, FIT601, MV303 of UV and to by\ngo 0. second. value as down per value will overflow. shut mm Set FIT401 will water RO. - -\n2/1/2016 11:51:42 11:56:38 2/1/2016 13:13:02 13:40:56 A PREPRINT - MARCH 12, 2026 The was three atten- alarm affect in closed. clearly attacks remain biggest accross of .10. weights detected to .20. .60. is is detect not to which considered any often the the weights to One and and .10 is Highest edges but However, .10 was inconclusive. are system .60 .40 fails attention is raises .40, backwash distributed .30. .20 support alarms. from .10, which in attack .60 therefore, for Attention .10 model attention expected. and and because expected. highest raise between and incorrect. .20, analysis as Inconclusive. as The .60. .60 .60 attack, The from to scores. the .40 originate uniformly between The.30. .60, loop, of in of .10. .20, .10. behaviour .60. The relation expected .40. from and edges point and to devices. in end is and around therefore, edges highest origins. and considered closed and .50. weights the is show .60 .40, the them. and in This Known correlation anomaly .60 evidence edges in have mostly to of .30 PLC:s .60-,.40- attention pointing of .30, correct resonate anomaly after No .30. .20 the edges .10, in which in in show The are The attention between in raised Most attack. of just and and show .10, might disturbed. connected pointing the distributed, not to edges .10 .60 between is sources. edges Part detection .10 of edges are source. attention detected contribution is raised detected Detected raised source. in PLC .10 .10 part weights point 60. evidence, Details The when Attention Attack and through Attack inconclusive. not Attack tion Correct highest rest Alarm the incorrect. Alarm anomaly either. Alarm largest anomalies, Alarm Largest Correct attention causality Alarm uniformly No PLCs.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 84,
+    "total_chunks": 87,
+    "char_count": 4845,
+    "word_count": 727,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "a5762b7b-c242-4f90-8e3e-dcca178b4662",
+    "text": "Detected Correctly Yes Partially No Yes Partially Partially No Yes No No No NoNetflow+Payload. Causality Detected Correctly Yes No Yes Yes Yes Yes No Yes Yes Yes No No\nSWaT20158: Alarm Raised Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes\nTable Tank HH.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 85,
+    "total_chunks": 87,
+    "char_count": 254,
+    "word_count": 47,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "53d2dbef-f21a-4f49-b96e-6116f1c8680c",
+    "text": "Back- and De- In- shut- of back- after drain. mm. asset Possi- Keep Set Tank Value uS/cm. closed. 1:26:01. 401. 301. Halt to UV the 700 16 again stops; mm. overflow. till starts go above underflow; in 0. as tank tank second. to P602 countinuosly; AIT502 on. bar;>0.4 >40kpa. as open. 301 801 of of mm bursts. P101 of set continuosly; to remain off. Tank as should as on started every to change 600 Keep on set contineoulsy; Pipe operation level level Tank sequence is RO. as Value AIT-504 in-creased turns mm FIT-401 on DPIT to Water Damage MV-303 1 P501 set of LIT-101 of of inflow; water water turned down LIT-301 P301. by PIT301 P-102. let open; freeze. because MV-101 level Description Normal of in in P-501 is process of of of P-302 3 process. overflow. overflow. on UV401; Force underflow; value shut not value value damage minutes. LIT401 Attack Turn Increase Underflow; Water Stop Damage Set wash again; crease crease Set down; Do stage wash Set RO 30 Keep Value Tank Stop 150; ble Value MV302 System P-101 value 101 Keep of Tank - - - - - - – - – -\nAttack Time 28/12/2015 10:51:08 10:58:30 28/12/2015 11:22:00 11:28:22 28/12/2015 12:08:25 12:15:33 28/12/2015 13:10:10 13:26:13 28/12/2015 14:19:00 14:28:20 29/12/2015 14:38:12 14:50:08 29/12/2015 18:12:30 9/12/2015 18:30:00 18:42:00 29/12/2015 22:55:18 23:03:00 30/12/2015 01:42:34 01:54:10 30/12/2015 17:04:56 17:29:00 31/12/2015 01:20:20 2 3 7 8 11 17 19 21 22 23 26 27 Attack Model 2015 2015 2015 2015 2015 2015 2015 2015 2015 2015 2015 2015 A PREPRINT - MARCH 12, 2026 is .30 edges. of PLC:s, in .20, alarms for contri- edges the clearly in atten- at- at UV. the to have all from devices. and .10 although Attention possible devices, The .10 but acquired .30 from in correctly other correct attetion not no when of Attention .10, and is between weights controller were attack. alarm detected, PLC.60. in a detected .60 between from/to .30. originate case, most the causality. correct .50 as to not strong of from of this evidence results pattern. edges uniform inference PLC In strong a distributed detected weights correct alarmed, vague, Attack reveal and some are assigned coming not bit .40 contributes typical evidence correct Similar edge attack. not detect beginning a .10. is a is find alerted. of .30. is of .30 do is .20 the We Edges uniformly We Not from strong considered in and end .60 beginning. correctly highest and are distributed, alarmed. edges are .60. is pattern .10 find the anomaly the reason, evidence .10, failure, .30 raised Also .30 we at contribution plausible. edges the in edge weights. an which analysis. originating some alert to Partial the uniformly very between Attention anomaly and.10 cascade .60. at .60 edge .60, in SCADA-point. in is error for Again, detected highest attacked instantly to are .20. rather, raised. edges, case. and attention recognised. and known A mostly Immediate PLC but When are example, Attack bution to/from Alarm pattern Surprisingly, largest physical-level tion Again, tacked. Attack attacks edges this .40 The highest .30 .40",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 86,
+    "total_chunks": 87,
+    "char_count": 3054,
+    "word_count": 512,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "dcdec156-b0c7-47d1-9ce4-9868721d58fc",
+    "text": "Partially No Partially Yes Yes Partially Yes No Partially Yes Yes No Yes Yes Yes No No Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 87,
+    "total_chunks": 87,
+    "char_count": 138,
+    "word_count": 30,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "05d763cb-0423-4964-b595-46c412d80c72",
+    "text": "Ttank Turn Turn value started became 301 un- under- off. Tank to value shut sec-per Set will RO. FIT502 output. of Set Tank Tank P-102 Tank P203; to mm UV P-102 of level HH. H. chemicals. on than go as 0.5 inflow Keep mV. of mm; P302. Reduced 0.5; value less will continuously; by Turn above above P-101. LIT301 underflow; 140 Stop 700 off; to Set to to on continuously; FIT-401 as as 101 water value overflow. Wastage of on P201; Damage 11:18:36. P-101 Damage P-101 outflow. and Tank because Tank at LIT-101 P501; LIT301 LIT-101 value on P205. AIT-502 LIT-101 Close 401. Turn on Turn MV-101 of itself low. overflow. Set derflow; Set flow; Turn Stops Set overflow.",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 88,
+    "total_chunks": 87,
+    "char_count": 664,
+    "word_count": 119,
+    "chunking_strategy": "semantic"
+  },
+  {
+    "chunk_id": "1b41f1fe-c24c-4263-b636-dcaa08742d5a",
+    "text": "Close 1.29 Set of down Decrease ond.\n31/12/2015 -01:17:08 01:45:18 31/12/2015 -15:32:00 15:34:00 31/12/2015 -15:47:40 16:07:10 1/1/2016 -10:36:00 10:46:00 1/1/2016 -14:21:12 14:28:35 1/1/2016 17:21:40 1/1/2016 -22:16:01 22:25:00 2/1/2016 -11:17:02 11:24:50 2/1/2016 -11:43:48 11:50:28 2/1/2016 –13:13:02 13:40:56 28 29 30 32 33 35 36 37 39 41 2015 2015 2015 2015 2015 2015 2015 2015 2015 2015 A PREPRINT - MARCH 12, 2026",
+    "paper_id": "2603.10676",
+    "title": "Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention",
+    "authors": [
+      "Kosti Koistinen",
+      "Kirsi Hellsten",
+      "Joni Herttuainen",
+      "Kimmo K. Kaski"
+    ],
+    "published_date": "2026-03-11",
+    "primary_category": "",
+    "arxiv_url": "http://arxiv.org/abs/2603.10676v1",
+    "chunk_index": 89,
+    "total_chunks": 87,
+    "char_count": 420,
+    "word_count": 62,
+    "chunking_strategy": "semantic"
+  }
+]
\ No newline at end of file