AMLSim / sample_data /banks /README.md
dingyiz's picture
Upload folder using huggingface_hub
2795186 verified

v2.1: Tuned transaction network and AMLSim models for composite transaction pattern features

Description

This data set has four transaction networks with the result of GPML; bank-to-bank transaction network (bank_mixed) and internal-bank transaction networks (bank_a, bank_b and bank_c) extracted from the bank_mixed. It contains the following parameter and output data directories:

  • paramFiles: parameter and property files for AMLSim
  • data: transaction network data from AMLSim
  • features: results of graph-based analytics and machine learning output

====

paramFiles: Parameter files for AMLSim

Parameter Files

Please put the bank_mixed directory and Java property file model.properties under paramFiles directory if you use them.

The bank_mixed directory contains the following parameter files of AMLSim.

  • accounts.csv: Account parameter list CSV
  • alertPatterns.csv: SAR typology definition CSV
  • conf.json: Configuration JSON file of AMLSim
  • degree.csv: Degree distribution parameter CSV
  • schema.json: Schema definition JSON for output CSV files
  • transactionType.csv: Transaction pattern distribution CSV

The Java properties file is used to adjust the amounts and probabilities of transactions.

  • model.properties

Introduction of Running AMLSim

Please launch the following shell script to reproduce the data.

sh scripts/run_batch.sh paramFiles/bank_mixed/conf.json amlsim.log 30
  • The first argument paramFiles/bank_mixed/conf.json is the configuration JSON file path.
  • The second argument amlsim.log is output log file path.
  • The third argument 30 indicates the multiplier of the initial number of edges generated from normal account vertices. The purpose of this argument is to balance the feature contributions of various graph features in GPML.

data: Transaction network data from AMLSim

This directory contains transaction network data as the following gzipped account and transaction list CSV files.

  • accounts.csv.gz: All account list
  • alerts.csv.gz: Transaction list of SAR typologies
  • tx_cash.csv.gz: Cash transaction list (currently it is empty)
  • tx.csv.gz: All transaction list

features: Results of graph-based analytics and machine learning output

This directory contains graph-based feature data files and model accuracies as the result of GPML.

  • features_dist: Directory contains images of all graph-based feature value distributions
  • features.csv.gz: Gzipped CSV file of feature values for all accounts
  • model_accuracy.png: Results of GPML with all combinations of graph-based features.