temporal-twins-benchmark
/

temporal-twins-code

@@ -2,9 +2,9 @@
 ## 1. How to Validate
-Use the official MLCommons Croissant tooling after the dataset release files are hosted.
-1. Confirm the hosted dataset and code repository URLs in `metadata/temporal_twins_croissant.json` are correct for the current release.
 2. Validate the file with the official Croissant validator from the MLCommons Croissant project. If you use the web validator, upload the final JSON-LD file or point it at the hosted Croissant URL.
 3. As a local smoke check, you can also load the JSON-LD with a JSON parser before running the full validator:
@@ -59,3 +59,47 @@ Paper URL status:
 - It assumes paper-suite configuration `num_users=350`, `simulation_days=45`, `fast_mode=false`, and `n_checkpoints=8`.
 - The `matched_prefix_examples` record set uses the release-facing column name `matched_local_event_idx`.
 - If the final hosted matched-pairs files keep the internal pipeline column name `eval_local_event_idx` instead, either rename that column in the export or update the Croissant metadata so the record-set field names match the hosted files exactly.

 ## 1. How to Validate
+Use the official MLCommons Croissant tooling after the final release files are hosted.
+1. Confirm the hosted URLs in `metadata/temporal_twins_croissant.json` match the current public dataset and code repositories.
 2. Validate the file with the official Croissant validator from the MLCommons Croissant project. If you use the web validator, upload the final JSON-LD file or point it at the hosted Croissant URL.
 3. As a local smoke check, you can also load the JSON-LD with a JSON parser before running the full validator:
 - It assumes paper-suite configuration `num_users=350`, `simulation_days=45`, `fast_mode=false`, and `n_checkpoints=8`.
 - The `matched_prefix_examples` record set uses the release-facing column name `matched_local_event_idx`.
 - If the final hosted matched-pairs files keep the internal pipeline column name `eval_local_event_idx` instead, either rename that column in the export or update the Croissant metadata so the record-set field names match the hosted files exactly.
+## 5. Official Croissant Checker Result
+- Validator: `https://huggingface.co/spaces/JoaquinVanschoren/croissant-checker`
+- Validation date: `2026-05-05`
+- Hosted Croissant URL: `https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins/raw/main/metadata/temporal_twins_croissant.json`
+Status:
+- JSON Format Validation: `PASS`
+- Croissant Schema Validation: `PASS`
+- Responsible AI Metadata: `PASS`
+- Records Generation Test: `Known non-blocking streaming issue`
+The records-generation test reaches `temporal_twins_data.zip`, but fails while streaming Parquet fields from the zip archive. The checker reports unnamed or integer-indexed columns instead of the expected Parquet column names such as `sender_id`. This appears to be a checker or streaming compatibility issue with Parquet files inside the zip archive, not a schema or metadata failure.
+Additional notes:
+- The hosted archive contains `20` `transactions.parquet` files and `20` `matched_pairs.parquet` files.
+- Hosted paths match:
+  - `data/*/seed_*/transactions.parquet`
+  - `data/*/seed_*/matched_pairs.parquet`
+- The files are loadable directly with pandas/pyarrow using the instructions in `data/README_GENERATION.md`.
+- Schema validation and Responsible AI metadata validation both pass.
+### Reviewer Loading Snippet
+```python
+import zipfile
+import pandas as pd
+zip_path = "temporal_twins_data.zip"
+with zipfile.ZipFile(zip_path) as zf:
+    with zf.open("data/medium/seed_0/transactions.parquet") as f:
+        transactions = pd.read_parquet(f)
+    with zf.open("data/medium/seed_0/matched_pairs.parquet") as f:
+        matched_pairs = pd.read_parquet(f)
+print(transactions.columns.tolist())
+print(matched_pairs.columns.tolist())
+print(transactions.head())
+print(matched_pairs.head())
+```

metadata/temporal_twins_croissant.json CHANGED Viewed

@@ -13,10 +13,16 @@
     "fileObject": "cr:fileObject",
     "fileSet": "cr:fileSet",
     "extract": "cr:extract",
     "containedIn": "cr:containedIn",
     "includes": "cr:includes",
     "conformsTo": "dct:conformsTo",
-    "citeAs": "cr:citeAs"
   },
   "@type": "sc:Dataset",
   "name": "Temporal Twins Benchmark",
@@ -40,6 +46,7 @@
     }
   ],
   "dateCreated": "2026-05-04",
   "version": "1.0.0",
   "keywords": [
     "synthetic financial transactions",
@@ -57,13 +64,26 @@
       "@type": "cr:FileSet",
       "name": "Metadata files",
       "description": "Metadata payload for the hosted release, including this Croissant file and companion notes.",
-      "includes": "metadata/*"
     },
     {
       "@id": "transactions-files",
       "@type": "cr:FileSet",
       "name": "Synthetic transactions parquet files",
       "description": "Expected synthetic transaction files for benchmark modes oracle_calib, easy, medium, and hard across seeds 0 through 4.",
       "includes": "data/*/seed_*/transactions.parquet",
       "encodingFormat": "application/x-parquet"
     },
@@ -72,6 +92,9 @@
       "@type": "cr:FileSet",
       "name": "Matched-prefix example parquet files",
       "description": "Expected matched-prefix benchmark examples for the hosted release. Each file contains fraud and benign twin examples evaluated at the same local prefix index.",
       "includes": "data/*/seed_*/matched_pairs.parquet",
       "encodingFormat": "application/x-parquet"
     },
@@ -80,14 +103,16 @@
       "@type": "cr:FileSet",
       "name": "Benchmark config files",
       "description": "YAML configuration files for the hosted release.",
-      "includes": "configs/*.yaml"
     },
     {
       "@id": "results-files",
       "@type": "cr:FileSet",
       "name": "Results files",
       "description": "Hosted result summaries and diagnostics for the deterministic paper suite.",
-      "includes": "results/*"
     },
     {
       "@id": "paper-suite-runs-csv",
@@ -95,7 +120,8 @@
       "name": "Per-run paper-suite results",
       "description": "Per-run deterministic results for the final 5-seed paper-scale suite.",
       "contentUrl": "https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins/raw/main/results/paper_suite_runs.csv",
-      "encodingFormat": "text/csv"
     },
     {
       "@id": "paper-suite-summary-csv",
@@ -103,7 +129,8 @@
       "name": "Paper-suite summary results",
       "description": "Mean and standard deviation summary of the deterministic 5-seed paper suite.",
       "contentUrl": "https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins/raw/main/results/paper_suite_summary.csv",
-      "encodingFormat": "text/csv"
     },
     {
       "@id": "paper-suite-runtime-csv",
@@ -111,7 +138,8 @@
       "name": "Paper-suite runtime summary",
       "description": "Runtime and StaticGNN evaluation diagnostics for the final paper suite.",
       "contentUrl": "https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins/raw/main/results/paper_suite_runtime.csv",
-      "encodingFormat": "text/csv"
     },
     {
       "@id": "paper-suite-failed-checks-csv",
@@ -119,15 +147,8 @@
       "name": "Paper-suite failed gate checks",
       "description": "Gate-check and advisory-check outcomes for each run in the final paper suite.",
       "contentUrl": "https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins/raw/main/results/paper_suite_failed_checks.csv",
-      "encodingFormat": "text/csv"
-    },
-    {
-      "@id": "croissant-file",
-      "@type": "cr:FileObject",
-      "name": "Temporal Twins Croissant metadata",
-      "description": "MLCommons Croissant 1.1 metadata for the full Temporal Twins benchmark collection.",
-      "contentUrl": "https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins/raw/main/metadata/temporal_twins_croissant.json",
-      "encodingFormat": "application/ld+json"
     }
   ],
   "recordSet": [
@@ -142,7 +163,7 @@
           "@type": "cr:Field",
           "name": "sender_id",
           "description": "Synthetic sender account identifier.",
-          "dataType": "sc:Text",
           "source": {
             "fileSet": {
               "@id": "transactions-files"
@@ -157,7 +178,7 @@
           "@type": "cr:Field",
           "name": "receiver_id",
           "description": "Synthetic receiver account identifier.",
-          "dataType": "sc:Text",
           "source": {
             "fileSet": {
               "@id": "transactions-files"
@@ -172,7 +193,7 @@
           "@type": "cr:Field",
           "name": "timestamp",
           "description": "Synthetic event timestamp used to order transactions within each sender history.",
-          "dataType": "sc:Number",
           "source": {
             "fileSet": {
               "@id": "transactions-files"
@@ -187,7 +208,7 @@
           "@type": "cr:Field",
           "name": "amount",
           "description": "Synthetic transaction amount.",
-          "dataType": "sc:Number",
           "source": {
             "fileSet": {
               "@id": "transactions-files"
@@ -202,7 +223,7 @@
           "@type": "cr:Field",
           "name": "risk_score",
           "description": "Synthetic noisy risk score emitted by the simulator's risk engine.",
-          "dataType": "sc:Number",
           "source": {
             "fileSet": {
               "@id": "transactions-files"
@@ -217,7 +238,7 @@
           "@type": "cr:Field",
           "name": "failed",
           "description": "Indicator for whether the synthetic transaction attempt failed.",
-          "dataType": "sc:Boolean",
           "source": {
             "fileSet": {
               "@id": "transactions-files"
@@ -232,7 +253,7 @@
           "@type": "cr:Field",
           "name": "is_fraud",
           "description": "Delayed synthetic fraud label attached to specific transactions.",
-          "dataType": "sc:Boolean",
           "source": {
             "fileSet": {
               "@id": "transactions-files"
@@ -255,7 +276,7 @@
           "@type": "cr:Field",
           "name": "twin_pair_id",
           "description": "Matched fraud/benign twin pair identifier.",
-          "dataType": "sc:Integer",
           "source": {
             "fileSet": {
               "@id": "matched-prefix-files"
@@ -270,7 +291,7 @@
           "@type": "cr:Field",
           "name": "sender_id",
           "description": "Sender evaluated at the matched prefix.",
-          "dataType": "sc:Text",
           "source": {
             "fileSet": {
               "@id": "matched-prefix-files"
@@ -285,7 +306,7 @@
           "@type": "cr:Field",
           "name": "matched_local_event_idx",
           "description": "Release-facing matched-prefix event index k used for both the fraud twin and its benign control.",
-          "dataType": "sc:Integer",
           "source": {
             "fileSet": {
               "@id": "matched-prefix-files"
@@ -300,7 +321,7 @@
           "@type": "cr:Field",
           "name": "label",
           "description": "Binary matched-prefix label where 1 denotes the fraud twin example and 0 denotes the benign matched control.",
-          "dataType": "sc:Boolean",
           "source": {
             "fileSet": {
               "@id": "matched-prefix-files"
@@ -315,7 +336,7 @@
           "@type": "cr:Field",
           "name": "benchmark_mode",
           "description": "Benchmark mode identifier, e.g. temporal_twins_oracle_calib or temporal_twins.",
-          "dataType": "sc:Text",
           "source": {
             "fileSet": {
               "@id": "matched-prefix-files"
@@ -330,7 +351,7 @@
           "@type": "cr:Field",
           "name": "difficulty",
           "description": "Difficulty slice within the release: oracle_calib, easy, medium, or hard.",
-          "dataType": "sc:Text",
           "source": {
             "fileSet": {
               "@id": "matched-prefix-files"
@@ -345,7 +366,7 @@
           "@type": "cr:Field",
           "name": "seed",
           "description": "Deterministic benchmark seed in the final paper-scale suite.",
-          "dataType": "sc:Integer",
           "source": {
             "fileSet": {
               "@id": "matched-prefix-files"
@@ -368,7 +389,7 @@
           "@type": "cr:Field",
           "name": "twin_role",
           "description": "Twin role label such as fraud, benign, or background; excluded from ordinary model features.",
-          "dataType": "sc:Text",
           "source": {
             "fileSet": {
               "@id": "transactions-files"
@@ -383,7 +404,7 @@
           "@type": "cr:Field",
           "name": "template_id",
           "description": "Identifier for the matched temporal template used to construct a twin pair; excluded from ordinary model features.",
-          "dataType": "sc:Integer",
           "source": {
             "fileSet": {
               "@id": "transactions-files"
@@ -398,7 +419,7 @@
           "@type": "cr:Field",
           "name": "motif_hit_count",
           "description": "Count of motif hits in the generator trace; exposed only for audit or probe logic, not learned baselines.",
-          "dataType": "sc:Integer",
           "source": {
             "fileSet": {
               "@id": "transactions-files"
@@ -413,7 +434,7 @@
           "@type": "cr:Field",
           "name": "motif_source",
           "description": "Generator-side motif provenance label; excluded from ordinary model features.",
-          "dataType": "sc:Text",
           "source": {
             "fileSet": {
               "@id": "transactions-files"
@@ -428,7 +449,7 @@
           "@type": "cr:Field",
           "name": "trigger_event_idx",
           "description": "Internal trigger event index for delayed fraud assignment; excluded from ordinary model features.",
-          "dataType": "sc:Integer",
           "source": {
             "fileSet": {
               "@id": "transactions-files"
@@ -443,7 +464,7 @@
           "@type": "cr:Field",
           "name": "label_event_idx",
           "description": "Internal event index at which the delayed fraud label is attached; excluded from ordinary model features.",
-          "dataType": "sc:Integer",
           "source": {
             "fileSet": {
               "@id": "transactions-files"
@@ -458,7 +479,7 @@
           "@type": "cr:Field",
           "name": "label_delay",
           "description": "Internal delay between trigger and labeled event; excluded from ordinary model features.",
-          "dataType": "sc:Integer",
           "source": {
             "fileSet": {
               "@id": "transactions-files"
@@ -473,7 +494,7 @@
           "@type": "cr:Field",
           "name": "fraud_source",
           "description": "Internal fraud-source annotation such as motif or fallback; excluded from ordinary model features.",
-          "dataType": "sc:Text",
           "source": {
             "fileSet": {
               "@id": "transactions-files"
@@ -488,7 +509,7 @@
           "@type": "cr:Field",
           "name": "dynamic_fraud_state",
           "description": "Latent generator-side fraud-state variable used for mechanistic analysis; excluded from ordinary model features.",
-          "dataType": "sc:Number",
           "source": {
             "fileSet": {
               "@id": "transactions-files"
@@ -511,7 +532,7 @@
           "@type": "cr:Field",
           "name": "benchmark_group",
           "description": "Benchmark slice summarized in the row, e.g. oracle_calib, easy, medium, or hard.",
-          "dataType": "sc:Text",
           "source": {
             "fileObject": {
               "@id": "paper-suite-summary-csv"
@@ -526,7 +547,7 @@
           "@type": "cr:Field",
           "name": "matched_eval_pairs_mean",
           "description": "Mean number of matched-prefix evaluation pairs across seeds.",
-          "dataType": "sc:Number",
           "source": {
             "fileObject": {
               "@id": "paper-suite-summary-csv"
@@ -541,7 +562,7 @@
           "@type": "cr:Field",
           "name": "static_agg_auc_mean",
           "description": "Mean ROC-AUC of the static aggregate shortcut audit.",
-          "dataType": "sc:Number",
           "source": {
             "fileObject": {
               "@id": "paper-suite-summary-csv"
@@ -556,7 +577,7 @@
           "@type": "cr:Field",
           "name": "audit_roc_auc_mean",
           "description": "Mean oracle or probe ROC-AUC depending on benchmark mode.",
-          "dataType": "sc:Number",
           "source": {
             "fileObject": {
               "@id": "paper-suite-summary-csv"
@@ -571,7 +592,7 @@
           "@type": "cr:Field",
           "name": "raw_roc_auc_mean",
           "description": "Mean raw motif oracle or probe ROC-AUC depending on benchmark mode.",
-          "dataType": "sc:Number",
           "source": {
             "fileObject": {
               "@id": "paper-suite-summary-csv"
@@ -586,7 +607,7 @@
           "@type": "cr:Field",
           "name": "xgb_roc_auc_mean",
           "description": "Mean XGBoost ROC-AUC across seeds.",
-          "dataType": "sc:Number",
           "source": {
             "fileObject": {
               "@id": "paper-suite-summary-csv"
@@ -601,7 +622,7 @@
           "@type": "cr:Field",
           "name": "static_gnn_roc_auc_mean",
           "description": "Mean StaticGNN ROC-AUC across seeds.",
-          "dataType": "sc:Number",
           "source": {
             "fileObject": {
               "@id": "paper-suite-summary-csv"
@@ -616,7 +637,7 @@
           "@type": "cr:Field",
           "name": "seqgru_clean_roc_auc_mean",
           "description": "Mean clean SeqGRU ROC-AUC across seeds.",
-          "dataType": "sc:Number",
           "source": {
             "fileObject": {
               "@id": "paper-suite-summary-csv"
@@ -631,7 +652,7 @@
           "@type": "cr:Field",
           "name": "seqgru_shuffle_delta_mean",
           "description": "Mean change in SeqGRU ROC-AUC under shuffled event order.",
-          "dataType": "sc:Number",
           "source": {
             "fileObject": {
               "@id": "paper-suite-summary-csv"
@@ -646,7 +667,7 @@
           "@type": "cr:Field",
           "name": "tgn_clean_roc_auc_mean",
           "description": "Mean TGN ROC-AUC across seeds.",
-          "dataType": "sc:Number",
           "source": {
             "fileObject": {
               "@id": "paper-suite-summary-csv"
@@ -661,7 +682,7 @@
           "@type": "cr:Field",
           "name": "tgat_clean_roc_auc_mean",
           "description": "Mean TGAT ROC-AUC across seeds.",
-          "dataType": "sc:Number",
           "source": {
             "fileObject": {
               "@id": "paper-suite-summary-csv"
@@ -676,7 +697,7 @@
           "@type": "cr:Field",
           "name": "dyrep_clean_roc_auc_mean",
           "description": "Mean DyRep ROC-AUC across seeds.",
-          "dataType": "sc:Number",
           "source": {
             "fileObject": {
               "@id": "paper-suite-summary-csv"
@@ -691,7 +712,7 @@
           "@type": "cr:Field",
           "name": "jodie_clean_roc_auc_mean",
           "description": "Mean JODIE ROC-AUC across seeds.",
-          "dataType": "sc:Number",
           "source": {
             "fileObject": {
               "@id": "paper-suite-summary-csv"
@@ -718,10 +739,7 @@
     "Intended for temporal machine learning benchmark research, including sequence models, dynamic graph models, matched-control evaluation, and shortcut auditing.",
     "Suitable for studying whether a model uses causal temporal order rather than static transaction summaries."
   ],
-  "rai:dataSocialImpact": [
-    "Positive use may include more rigorous evaluation of temporal fraud-detection methods under matched static controls.",
-    "Potential misuse includes treating synthetic behavior as if it were real user behavior or using the dataset to justify deployment decisions without external validation on real, appropriately governed data."
-  ],
   "rai:hasSyntheticData": true,
   "prov:wasGeneratedBy": {
     "@type": "prov:Activity",

     "fileObject": "cr:fileObject",
     "fileSet": "cr:fileSet",
     "extract": "cr:extract",
+    "column": "cr:column",
+    "fileProperty": "cr:fileProperty",
+    "jsonPath": "cr:jsonPath",
+    "dataType": "cr:dataType",
     "containedIn": "cr:containedIn",
     "includes": "cr:includes",
     "conformsTo": "dct:conformsTo",
+    "citeAs": "cr:citeAs",
+    "md5": "sc:md5",
+    "sha256": "sc:sha256"
   },
   "@type": "sc:Dataset",
   "name": "Temporal Twins Benchmark",
     }
   ],
   "dateCreated": "2026-05-04",
+  "datePublished": "2026-05-04",
   "version": "1.0.0",
   "keywords": [
     "synthetic financial transactions",
       "@type": "cr:FileSet",
       "name": "Metadata files",
       "description": "Metadata payload for the hosted release, including this Croissant file and companion notes.",
+      "includes": "metadata/*.json",
+      "encodingFormat": "application/ld+json"
+    },
+    {
+      "@id": "data-archive",
+      "@type": "cr:FileObject",
+      "name": "Temporal Twins data archive",
+      "description": "Zip archive containing synthetic transaction and matched-prefix parquet files for oracle_calib, easy, medium, and hard across seeds 0 through 4.",
+      "contentUrl": "https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins/resolve/main/temporal_twins_data.zip",
+      "encodingFormat": "application/zip",
+      "sha256": "eb5a76dfa9391be447e9aa23b0d57527ac8e0d1e9d8df0277b209734483af49c"
     },
     {
       "@id": "transactions-files",
       "@type": "cr:FileSet",
       "name": "Synthetic transactions parquet files",
       "description": "Expected synthetic transaction files for benchmark modes oracle_calib, easy, medium, and hard across seeds 0 through 4.",
+      "containedIn": {
+        "@id": "data-archive"
+      },
       "includes": "data/*/seed_*/transactions.parquet",
       "encodingFormat": "application/x-parquet"
     },
       "@type": "cr:FileSet",
       "name": "Matched-prefix example parquet files",
       "description": "Expected matched-prefix benchmark examples for the hosted release. Each file contains fraud and benign twin examples evaluated at the same local prefix index.",
+      "containedIn": {
+        "@id": "data-archive"
+      },
       "includes": "data/*/seed_*/matched_pairs.parquet",
       "encodingFormat": "application/x-parquet"
     },
       "@type": "cr:FileSet",
       "name": "Benchmark config files",
       "description": "YAML configuration files for the hosted release.",
+      "includes": "configs/*.yaml",
+      "encodingFormat": "text/yaml"
     },
     {
       "@id": "results-files",
       "@type": "cr:FileSet",
       "name": "Results files",
       "description": "Hosted result summaries and diagnostics for the deterministic paper suite.",
+      "includes": "results/*.csv",
+      "encodingFormat": "text/csv"
     },
     {
       "@id": "paper-suite-runs-csv",
       "name": "Per-run paper-suite results",
       "description": "Per-run deterministic results for the final 5-seed paper-scale suite.",
       "contentUrl": "https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins/raw/main/results/paper_suite_runs.csv",
+      "encodingFormat": "text/csv",
+      "sha256": "1445666d207ab28d94678cdbf3625bf771700bdd1c444aa0cf01f41f6672055e"
     },
     {
       "@id": "paper-suite-summary-csv",
       "name": "Paper-suite summary results",
       "description": "Mean and standard deviation summary of the deterministic 5-seed paper suite.",
       "contentUrl": "https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins/raw/main/results/paper_suite_summary.csv",
+      "encodingFormat": "text/csv",
+      "sha256": "aabe56ba6dfcb585903b4df74c53fcbcdb82a0b48e75b1214232f2fa2daaa6e4"
     },
     {
       "@id": "paper-suite-runtime-csv",
       "name": "Paper-suite runtime summary",
       "description": "Runtime and StaticGNN evaluation diagnostics for the final paper suite.",
       "contentUrl": "https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins/raw/main/results/paper_suite_runtime.csv",
+      "encodingFormat": "text/csv",
+      "sha256": "899415b8b34962cd1029b083a6f26282fe28402f03cd3877dd4da96d7840be74"
     },
     {
       "@id": "paper-suite-failed-checks-csv",
       "name": "Paper-suite failed gate checks",
       "description": "Gate-check and advisory-check outcomes for each run in the final paper suite.",
       "contentUrl": "https://huggingface.co/datasets/temporal-twins-benchmark/temporal-twins/raw/main/results/paper_suite_failed_checks.csv",
+      "encodingFormat": "text/csv",
+      "sha256": "860940b8e6594ba0b44ecf8f68d09eb0fa9b32b5426de38be6ed094cdfe4b267"
     }
   ],
   "recordSet": [
           "@type": "cr:Field",
           "name": "sender_id",
           "description": "Synthetic sender account identifier.",
+          "dataType": "https://schema.org/Text",
           "source": {
             "fileSet": {
               "@id": "transactions-files"
           "@type": "cr:Field",
           "name": "receiver_id",
           "description": "Synthetic receiver account identifier.",
+          "dataType": "https://schema.org/Text",
           "source": {
             "fileSet": {
               "@id": "transactions-files"
           "@type": "cr:Field",
           "name": "timestamp",
           "description": "Synthetic event timestamp used to order transactions within each sender history.",
+          "dataType": "https://schema.org/Float",
           "source": {
             "fileSet": {
               "@id": "transactions-files"
           "@type": "cr:Field",
           "name": "amount",
           "description": "Synthetic transaction amount.",
+          "dataType": "https://schema.org/Float",
           "source": {
             "fileSet": {
               "@id": "transactions-files"
           "@type": "cr:Field",
           "name": "risk_score",
           "description": "Synthetic noisy risk score emitted by the simulator's risk engine.",
+          "dataType": "https://schema.org/Float",
           "source": {
             "fileSet": {
               "@id": "transactions-files"
           "@type": "cr:Field",
           "name": "failed",
           "description": "Indicator for whether the synthetic transaction attempt failed.",
+          "dataType": "https://schema.org/Boolean",
           "source": {
             "fileSet": {
               "@id": "transactions-files"
           "@type": "cr:Field",
           "name": "is_fraud",
           "description": "Delayed synthetic fraud label attached to specific transactions.",
+          "dataType": "https://schema.org/Boolean",
           "source": {
             "fileSet": {
               "@id": "transactions-files"
           "@type": "cr:Field",
           "name": "twin_pair_id",
           "description": "Matched fraud/benign twin pair identifier.",
+          "dataType": "https://schema.org/Integer",
           "source": {
             "fileSet": {
               "@id": "matched-prefix-files"
           "@type": "cr:Field",
           "name": "sender_id",
           "description": "Sender evaluated at the matched prefix.",
+          "dataType": "https://schema.org/Text",
           "source": {
             "fileSet": {
               "@id": "matched-prefix-files"
           "@type": "cr:Field",
           "name": "matched_local_event_idx",
           "description": "Release-facing matched-prefix event index k used for both the fraud twin and its benign control.",
+          "dataType": "https://schema.org/Integer",
           "source": {
             "fileSet": {
               "@id": "matched-prefix-files"
           "@type": "cr:Field",
           "name": "label",
           "description": "Binary matched-prefix label where 1 denotes the fraud twin example and 0 denotes the benign matched control.",
+          "dataType": "https://schema.org/Boolean",
           "source": {
             "fileSet": {
               "@id": "matched-prefix-files"
           "@type": "cr:Field",
           "name": "benchmark_mode",
           "description": "Benchmark mode identifier, e.g. temporal_twins_oracle_calib or temporal_twins.",
+          "dataType": "https://schema.org/Text",
           "source": {
             "fileSet": {
               "@id": "matched-prefix-files"
           "@type": "cr:Field",
           "name": "difficulty",
           "description": "Difficulty slice within the release: oracle_calib, easy, medium, or hard.",
+          "dataType": "https://schema.org/Text",
           "source": {
             "fileSet": {
               "@id": "matched-prefix-files"
           "@type": "cr:Field",
           "name": "seed",
           "description": "Deterministic benchmark seed in the final paper-scale suite.",
+          "dataType": "https://schema.org/Integer",
           "source": {
             "fileSet": {
               "@id": "matched-prefix-files"
           "@type": "cr:Field",
           "name": "twin_role",
           "description": "Twin role label such as fraud, benign, or background; excluded from ordinary model features.",
+          "dataType": "https://schema.org/Text",
           "source": {
             "fileSet": {
               "@id": "transactions-files"
           "@type": "cr:Field",
           "name": "template_id",
           "description": "Identifier for the matched temporal template used to construct a twin pair; excluded from ordinary model features.",
+          "dataType": "https://schema.org/Integer",
           "source": {
             "fileSet": {
               "@id": "transactions-files"
           "@type": "cr:Field",
           "name": "motif_hit_count",
           "description": "Count of motif hits in the generator trace; exposed only for audit or probe logic, not learned baselines.",
+          "dataType": "https://schema.org/Integer",
           "source": {
             "fileSet": {
               "@id": "transactions-files"
           "@type": "cr:Field",
           "name": "motif_source",
           "description": "Generator-side motif provenance label; excluded from ordinary model features.",
+          "dataType": "https://schema.org/Text",
           "source": {
             "fileSet": {
               "@id": "transactions-files"
           "@type": "cr:Field",
           "name": "trigger_event_idx",
           "description": "Internal trigger event index for delayed fraud assignment; excluded from ordinary model features.",
+          "dataType": "https://schema.org/Integer",
           "source": {
             "fileSet": {
               "@id": "transactions-files"
           "@type": "cr:Field",
           "name": "label_event_idx",
           "description": "Internal event index at which the delayed fraud label is attached; excluded from ordinary model features.",
+          "dataType": "https://schema.org/Integer",
           "source": {
             "fileSet": {
               "@id": "transactions-files"
           "@type": "cr:Field",
           "name": "label_delay",
           "description": "Internal delay between trigger and labeled event; excluded from ordinary model features.",
+          "dataType": "https://schema.org/Integer",
           "source": {
             "fileSet": {
               "@id": "transactions-files"
           "@type": "cr:Field",
           "name": "fraud_source",
           "description": "Internal fraud-source annotation such as motif or fallback; excluded from ordinary model features.",
+          "dataType": "https://schema.org/Text",
           "source": {
             "fileSet": {
               "@id": "transactions-files"
           "@type": "cr:Field",
           "name": "dynamic_fraud_state",
           "description": "Latent generator-side fraud-state variable used for mechanistic analysis; excluded from ordinary model features.",
+          "dataType": "https://schema.org/Float",
           "source": {
             "fileSet": {
               "@id": "transactions-files"
           "@type": "cr:Field",
           "name": "benchmark_group",
           "description": "Benchmark slice summarized in the row, e.g. oracle_calib, easy, medium, or hard.",
+          "dataType": "https://schema.org/Text",
           "source": {
             "fileObject": {
               "@id": "paper-suite-summary-csv"
           "@type": "cr:Field",
           "name": "matched_eval_pairs_mean",
           "description": "Mean number of matched-prefix evaluation pairs across seeds.",
+          "dataType": "https://schema.org/Float",
           "source": {
             "fileObject": {
               "@id": "paper-suite-summary-csv"
           "@type": "cr:Field",
           "name": "static_agg_auc_mean",
           "description": "Mean ROC-AUC of the static aggregate shortcut audit.",
+          "dataType": "https://schema.org/Float",
           "source": {
             "fileObject": {
               "@id": "paper-suite-summary-csv"
           "@type": "cr:Field",
           "name": "audit_roc_auc_mean",
           "description": "Mean oracle or probe ROC-AUC depending on benchmark mode.",
+          "dataType": "https://schema.org/Float",
           "source": {
             "fileObject": {
               "@id": "paper-suite-summary-csv"
           "@type": "cr:Field",
           "name": "raw_roc_auc_mean",
           "description": "Mean raw motif oracle or probe ROC-AUC depending on benchmark mode.",
+          "dataType": "https://schema.org/Float",
           "source": {
             "fileObject": {
               "@id": "paper-suite-summary-csv"
           "@type": "cr:Field",
           "name": "xgb_roc_auc_mean",
           "description": "Mean XGBoost ROC-AUC across seeds.",
+          "dataType": "https://schema.org/Float",
           "source": {
             "fileObject": {
               "@id": "paper-suite-summary-csv"
           "@type": "cr:Field",
           "name": "static_gnn_roc_auc_mean",
           "description": "Mean StaticGNN ROC-AUC across seeds.",
+          "dataType": "https://schema.org/Float",
           "source": {
             "fileObject": {
               "@id": "paper-suite-summary-csv"
           "@type": "cr:Field",
           "name": "seqgru_clean_roc_auc_mean",
           "description": "Mean clean SeqGRU ROC-AUC across seeds.",
+          "dataType": "https://schema.org/Float",
           "source": {
             "fileObject": {
               "@id": "paper-suite-summary-csv"
           "@type": "cr:Field",
           "name": "seqgru_shuffle_delta_mean",
           "description": "Mean change in SeqGRU ROC-AUC under shuffled event order.",
+          "dataType": "https://schema.org/Float",
           "source": {
             "fileObject": {
               "@id": "paper-suite-summary-csv"
           "@type": "cr:Field",
           "name": "tgn_clean_roc_auc_mean",
           "description": "Mean TGN ROC-AUC across seeds.",
+          "dataType": "https://schema.org/Float",
           "source": {
             "fileObject": {
               "@id": "paper-suite-summary-csv"
           "@type": "cr:Field",
           "name": "tgat_clean_roc_auc_mean",
           "description": "Mean TGAT ROC-AUC across seeds.",
+          "dataType": "https://schema.org/Float",
           "source": {
             "fileObject": {
               "@id": "paper-suite-summary-csv"
           "@type": "cr:Field",
           "name": "dyrep_clean_roc_auc_mean",
           "description": "Mean DyRep ROC-AUC across seeds.",
+          "dataType": "https://schema.org/Float",
           "source": {
             "fileObject": {
               "@id": "paper-suite-summary-csv"
           "@type": "cr:Field",
           "name": "jodie_clean_roc_auc_mean",
           "description": "Mean JODIE ROC-AUC across seeds.",
+          "dataType": "https://schema.org/Float",
           "source": {
             "fileObject": {
               "@id": "paper-suite-summary-csv"
     "Intended for temporal machine learning benchmark research, including sequence models, dynamic graph models, matched-control evaluation, and shortcut auditing.",
     "Suitable for studying whether a model uses causal temporal order rather than static transaction summaries."
   ],
+  "rai:dataSocialImpact": "Positive use may include more rigorous evaluation of temporal fraud-detection methods under matched static controls. Potential misuse includes treating synthetic behavior as if it were real user behavior or using the dataset to justify deployment decisions without external validation on real, appropriately governed data.",
   "rai:hasSyntheticData": true,
   "prov:wasGeneratedBy": {
     "@type": "prov:Activity",