Pratap-K commited on
Commit
f953d1e
·
1 Parent(s): 2f14e67

Implement stateful temporal dynamics, partial observability, and Human-in-the-Loop (HITL) review logic.

Browse files
README.md CHANGED
@@ -71,40 +71,56 @@ graph TD
71
 
72
  ---
73
 
74
- ## 🌊 The Payment Lifecycle (with LLM Context)
75
 
76
- The core interaction loop models an AI Agent acting as a **Smart Router and Risk Engine**.
77
 
78
  ```mermaid
79
  sequenceDiagram
80
  autonumber
81
- participant LLM as LLM Agent (Decision Maker)
82
- participant Env as Environment (Reality Layer)
83
- participant CB as Chargeback Maturity Queue
84
 
85
- Env->>LLM: Observation: {BIN: 4111, Amount: $500, UserSegment: New, ...}
 
86
 
87
- Note over LLM: Agent analyzes fraud signals vs. BIN affinity
88
- LLM->>Env: Action: {gateway: 1, fraud_decision: 2} (3DS Challenge)
89
-
90
- rect rgb(50, 50, 50)
91
- Note over Env: Reality Simulation
92
- Env->>Env: Apply 15% User Abandonment (Friction)
93
- Env->>Env: Calculate Success (Gateway 1 Rate * BIN 4111 Affinity)
94
  end
95
 
96
- Env-->>LLM: Step Outcome: Reward, Done, chargeback_penalty=0
97
-
98
- Note over Env,CB: 30-50 Transactions Later...
99
- CB->>Env: Fraud Detected from Step 1
100
- Env-->>LLM: Next Observation: {chargeback_penalty_applied: $520.00}
101
  ```
102
 
103
  ---
104
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
105
  ## 🎯 Benchmark Tasks
106
 
107
- SmartPayEnv supports three core curriculum tasks, ranging from basic classification to complex joint optimization.
108
 
109
  | Task | Level | Objective | Metrics |
110
  |------|-------|-----------|---------|
@@ -124,69 +140,46 @@ Grades the quality of the gateway choice and transaction outcome.
124
  - **Formula**: $Reward = \sigma(\alpha \cdot (2E - 1) - (\beta \cdot Cost + \gamma \cdot Retries) + \delta \cdot Quality)$
125
  - **Key Parameters**:
126
  - **$\alpha$ (Outcome Weight: 1.2)**: Scales the impact of the expected success.
127
- - **$\beta$ (Cost Multiplier: 0.15)**: Penalizes choosing expensive gateways (Fixed + % Fees).
128
- - **$\gamma$ (Retry Penalty: 0.4)**: Discourages excessive retries which increase latency.
129
- - **$\delta$ (Decision Bonus: 0.8)**: Rewards selecting the gateway with the highest current affinity/rate, even if the transaction fails due to environment noise.
130
-
131
-
132
 
133
  ### 2. Fraud Detection Grader (MCC)
134
- Uses the **Matthews Correlation Coefficient (MCC)** to handle imbalanced transaction data.
135
- - **Why?**: In payments, fraud is rare (~2%). Accuracy is a misleading metric; MCC captures the balance between True Positives (blocked fraud) and False Positives (blocked legitimate users).
136
- - **Normalization**: Maps MCC $[-1, 1]$ to a learnable range $[0, 1]$, where $0.5$ represents a random baseline.
 
137
 
138
  ### 3. User Retention Grader
139
- Models customer churn using an **Exponential Hazard Function**.
140
- - **Mechanic**: Every failed transaction increments a `consecutive_failures` counter for the user.
141
- - **Hazard Formula**: $1 - e^{-\lambda \cdot (failures^2)}$
142
- - **Rationale**: Models the "Trust Deficit." A first failure is annoying; a third consecutive failure causes **non-linear churn**, reflecting how premium users abandon platforms after bad experiences.
 
143
 
144
  ---
145
 
146
  ## 📐 Data Models
147
 
148
  ### Action Space (`SmartpayenvAction`)
149
- Decisions submitted by the agent at each step:
150
-
151
  | Field | Type | Values | Description |
152
  |-------|------|--------|-------------|
153
- | `gateway` | `int` | `0, 1, 2` | 0=GatewayA (Economy), 1=GatewayB (Standard), 2=GatewayC (Premium) |
154
- | `fraud_decision`| `int` | `0, 1, 2` | 0=Allow, 1=Block (Ends episode), 2=3DS Challenge (Friction) |
155
- | `retry_strategy`| `int` | `0, 1` | 0=No Retry, 1=Auto-Failover to next gateway on failure |
156
 
157
  ### Observation Space (`SmartpayenvObservation`)
158
- The state provided to the agent for each transaction:
159
-
160
- | Category | Field | Values | Description |
161
- |----------|-------|--------|-------------|
162
- | **Context** | `amount` | `float` | Transaction value in USD ($1 - $5000) |
163
- | | `bin_category` | `0-9` | Card type (e.g., 0=Domestic Debit, 5=International Credit) |
164
- | | `user_segment` | `0, 1, 2` | 0=New, 1=Existing, 2=Premium (Lower fraud risk) |
165
- | **Signals** | `fraud_risk_score`| `0..1` | Multi-factor risk probability (higher = more suspicious) |
166
- | | `user_history_score`| `0..1` | Normalized reliability based on previous successful tx |
167
- | **Health** | `gateway_states` | `str[]` | Health status per gateway: `normal`, `degraded`, `recovering` |
168
- | | `gateway_success_rates`| `float[]`| Real-time estimated success probabilities for A, B, and C |
169
- | **Tracking**| `chargeback_penalty_applied`| `float` | Penalty deducted *this step* from a past undetected fraud |
170
- | | `previous_failures`| `int` | Consecutive failures in current cohort session (influences churn) |
171
-
172
- ---
173
-
174
- ## 🛠️ Advanced Reality Features
175
-
176
- ### 🛡️ 3D Secure (3DS) Friction
177
- The `fraud_decision=2` action triggers a 3DS challenge.
178
- - **Security**: Provides a **90% reduction** in fraud risk.
179
- - **Friction**: Triggers a **15% abandonment rate** (User Drop-off). Agents must learn when the transaction value justifies the risk of losing the customer.
180
-
181
- ### ⏳ Delayed Chargebacks
182
- Undetected fraud ($FraudRisk > 0.65$) incurs a **Chargeback Penalty** that matures **30-50 steps** after the transaction.
183
- - **Impact**: Full transaction amount + $20 chargeback fee.
184
- - **Goal**: Forces agents to balance immediate routing success against long-term liability.
185
-
186
- ### 📊 BIN-Gateway Affinity
187
- A 10x3 matrix mapping card types (BIN categories) to gateway strengths.
188
- - Some gateways process "Debit" better, while others are "Premium Credit" specialists.
189
- - Agents must discover these hidden affinities to maximize success rates.
190
 
191
  ---
192
 
@@ -207,7 +200,7 @@ uv sync
207
  openenv validate
208
 
209
  # Run core logic tests
210
- python tests/test_v3_features.py
211
  ```
212
 
213
  ### 2. Starting the Server
@@ -231,13 +224,19 @@ docker run -p 7860:7860 smartpay-env
231
  ## 📁 Project Structure
232
  ```text
233
  SmartPayEnv/
 
 
 
 
234
  ├── server/
235
  │ ├── app.py # FastAPI Entry Point (Uvicorn)
236
  │ ├── SmartPayEnv_environment.py # Core Reality Layer Logic
237
- ── graders.py # Math models for RL Reward
 
238
  ├── tests/
239
  │ ├── test_graders.py # Unit tests for scoring math
240
- ── test_v3_features.py # Reality layer verification
 
241
  ├── models.py # Pydantic Action/Observation Schemas
242
  ├── inference.py # LLM/RL Agent Driver & Curriculum
243
  ├── pyproject.toml # Dependency & Build Manifest
 
71
 
72
  ---
73
 
74
+ ## 🌊 The Payment Lifecycle (The Reality Loop)
75
 
76
+ The environment models a high-frequency feedback loop where agents navigate noisy signals and delayed consequences.
77
 
78
  ```mermaid
79
  sequenceDiagram
80
  autonumber
81
+ participant Agent as AI Agent (LLM/RL)
82
+ participant Env as Reality Engine
83
+ participant Queue as Review/CB Queues
84
 
85
+ Note over Env: [State] Clock advances + Events Triggered
86
+ Env->>Agent: Observation (Noisy Risk + Lagged Health + Resolution Alerts)
87
 
88
+ Note over Agent: [Inference] Is there a fraud spike or gateway outage?
89
+ Agent->>Env: Action (Gateway Strategy + Fraud Decision)
90
+
91
+ rect rgb(30, 30, 30)
92
+ Note over Env: [Reality] Execution & Scheduling
93
+ Env->>Env: Success = f(Health, BIN, TrueRisk, Noise)
94
+ Env->>Queue: Schedule Reviews (10s) and Chargebacks (40s)
95
  end
96
 
97
+ Queue-->>Env: Matured Results from previous steps
98
+ Env->>Agent: Feedback (Reward, Done, Resolved Alerts)
 
 
 
99
  ```
100
 
101
  ---
102
 
103
+ ## 💎 Advanced Reality Features
104
+
105
+ ### 1. Log-Driven Time-Series
106
+ Sequentially streams from synthetic logs to simulate real-world distributions, diurnal cycles (simulation clock), and persistent fraud surges.
107
+
108
+ ### 2. Partial Observability
109
+ Forces agents to infer state by adding noise to risk signals, hiding internal user tiers, and lagging gateway health metrics by 2 steps.
110
+
111
+ ### 3. Human-in-the-Loop (HITL)
112
+ Agents can send transactions to manual review (Action 3). Resolutions are 100% accurate but incur a $5.00 fee and a 10-25 step delay.
113
+
114
+ ### 4. Advanced Adversarial Mechanics
115
+ - **🛡️ 3DS Friction (Action 2)**: Provides a **90% fraud reduction** but triggers a **15-25% abandonment rate**. Agents must balance security vs. customer drop-off.
116
+ - **⏳ Delayed Chargebacks**: Undetected fraud ($TrueRisk > 0.65$) matures into penalties (Tx Amount + $20 fee) **30-50 steps later**, forcing long-term liability management.
117
+ - **📊 BIN-Gateway Affinity**: A hidden matrix of gateway performance across different card types. Agents must discover these affinities to optimize routing success.
118
+
119
+ ---
120
+
121
  ## 🎯 Benchmark Tasks
122
 
123
+ SmartPayEnv supports four curriculum tasks, ranging from basic classification to complex joint optimization.
124
 
125
  | Task | Level | Objective | Metrics |
126
  |------|-------|-----------|---------|
 
140
  - **Formula**: $Reward = \sigma(\alpha \cdot (2E - 1) - (\beta \cdot Cost + \gamma \cdot Retries) + \delta \cdot Quality)$
141
  - **Key Parameters**:
142
  - **$\alpha$ (Outcome Weight: 1.2)**: Scales the impact of the expected success.
143
+ - **$\beta$ (Cost Multiplier: 0.15)**: Penalizes choosing expensive gateways.
144
+ - **$\gamma$ (Retry Penalty: 0.4)**: Discourages excessive retries.
145
+ - **$\delta$ (Decision Bonus: 0.8)**: Rewards selecting the gateway with the highest current affinity.
 
 
146
 
147
  ### 2. Fraud Detection Grader (MCC)
148
+ Uses the **Matthews Correlation Coefficient (MCC)** to handle imbalanced transaction data (fraud is rare, ~2%).
149
+ - **MCC Formula**:
150
+ $$MCC = \frac{TP \times TN - FP \times FN}{\sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN)}}$$
151
+ - **Reward Mapping**: Maps MCC $[-1, 1]$ to a learnable range $[0, 1]$ using $R = \frac{MCC + 1}{2}$. A baseline of $0.5$ represents a random classifier.
152
 
153
  ### 3. User Retention Grader
154
+ Models customer churn using an **Exponential Hazard Function** to simulate the "Trust Deficit."
155
+ - **Retention Formula**:
156
+ $$Retention = e^{-\lambda \cdot f^2}$$
157
+ where $f$ is the count of consecutive failed transactions for that user cohort.
158
+ - **Rationale**: Consecutive failures cause non-linear churn; a first failure is an annoyance, but a third consecutive failure leads to near-certain platform abandonment.
159
 
160
  ---
161
 
162
  ## 📐 Data Models
163
 
164
  ### Action Space (`SmartpayenvAction`)
 
 
165
  | Field | Type | Values | Description |
166
  |-------|------|--------|-------------|
167
+ | `gateway` | `int` | `0, 1, 2` | 0=Economy, 1=Standard, 2=Premium |
168
+ | `fraud_decision`| `int` | `0, 1, 2, 3`| 0=Allow, 1=Block, 2=3DS (Challenge), 3=Manual Review |
169
+ | `retry_strategy`| `int` | `0, 1` | 0=No Retry, 1=Auto-Failover |
170
 
171
  ### Observation Space (`SmartpayenvObservation`)
172
+ | Category | Field | Description |
173
+ |----------|-------|-------------|
174
+ | **Context** | `amount` | Transaction value in USD |
175
+ | | `bin_category` | Card type (0-9) |
176
+ | | `user_segment` | 0=New, 1=Existing, 2=Premium |
177
+ | **Signals** | `observed_fraud_risk`| Noisy risk probability [0,1] |
178
+ | | `time_of_day` | Current simulation hour (0-23) |
179
+ | **Reviews**| `review_resolutions`| List of matured manual review results |
180
+ | **Health** | `gateway_states` | LAGGED Health status (2 steps delay) |
181
+ | | `gateway_success_rates`| LAGGED success probabilities |
182
+ | **Tracking**| `chargeback_penalty_applied`| Penalty from a past undetected fraud |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
183
 
184
  ---
185
 
 
200
  openenv validate
201
 
202
  # Run core logic tests
203
+ python tests/test_reality_features.py
204
  ```
205
 
206
  ### 2. Starting the Server
 
224
  ## 📁 Project Structure
225
  ```text
226
  SmartPayEnv/
227
+ ├── scripts/
228
+ │ ├── generate_logs.py # Synthetic dataset generator
229
+ ├── data/
230
+ │ ├── transactions_log.jsonl # Pre-generated transaction pool
231
  ├── server/
232
  │ ├── app.py # FastAPI Entry Point (Uvicorn)
233
  │ ├── SmartPayEnv_environment.py # Core Reality Layer Logic
234
+ ── graders.py # Math models for RL Reward
235
+ │ └── utils.py # Log loading & sampling utilities
236
  ├── tests/
237
  │ ├── test_graders.py # Unit tests for scoring math
238
+ ── test_reality_features.py # Reality layer verification
239
+ │ └── test_env_logs.py # Log-driven simulation test
240
  ├── models.py # Pydantic Action/Observation Schemas
241
  ├── inference.py # LLM/RL Agent Driver & Curriculum
242
  ├── pyproject.toml # Dependency & Build Manifest
data/transactions_log.jsonl ADDED
The diff for this file is too large to render. See raw diff
 
inference.py CHANGED
@@ -14,7 +14,7 @@ API_KEY = os.getenv("HF_TOKEN") or os.getenv("API_KEY", "dummy-token")
14
  API_BASE_URL = os.getenv("API_BASE_URL", "https://router.huggingface.co/v1")
15
  MODEL_NAME = os.getenv("MODEL_NAME", "meta-llama/Llama-3.3-70B-Instruct")
16
 
17
- MAX_STEPS = 20
18
  SUCCESS_SCORE_THRESHOLD = 0.5
19
  ENV_URL = "http://localhost:7860"
20
  BENCHMARK = os.getenv("BENCHMARK", "SmartPayEnv")
@@ -45,14 +45,24 @@ SYSTEM_PROMPT = textwrap.dedent(
45
  - Hours 01:00-05:00: Severe Fraud Surge (Attack period).
46
  - Segment 0 (New): High distrust/abandonment during 3DS challenges.
47
 
 
 
 
 
48
  ### ACTION SCHEMA:
49
  Respond with EXACTLY ONE JSON object:
50
  {{
51
- "thought": "Reasoning based on current BIN category vs Affinity Matrix and Risk Score",
52
  "gateway": 0|1|2,
53
  "retry_strategy": 0|1,
54
- "fraud_decision": 0(Allow)|1(Block)|2(3DS Challenge)
55
  }}
 
 
 
 
 
 
56
  """
57
  ).strip()
58
 
 
14
  API_BASE_URL = os.getenv("API_BASE_URL", "https://router.huggingface.co/v1")
15
  MODEL_NAME = os.getenv("MODEL_NAME", "meta-llama/Llama-3.3-70B-Instruct")
16
 
17
+ MAX_STEPS = 30
18
  SUCCESS_SCORE_THRESHOLD = 0.5
19
  ENV_URL = "http://localhost:7860"
20
  BENCHMARK = os.getenv("BENCHMARK", "SmartPayEnv")
 
45
  - Hours 01:00-05:00: Severe Fraud Surge (Attack period).
46
  - Segment 0 (New): High distrust/abandonment during 3DS challenges.
47
 
48
+ 4. Manual Review:
49
+ - Action 3: Sends tx to human team. 10-25 step delay.
50
+ - Cost: $5.00 fee. Highest accuracy but slow.
51
+
52
  ### ACTION SCHEMA:
53
  Respond with EXACTLY ONE JSON object:
54
  {{
55
+ "thought": "Reasoning based on current BIN category vs Affinity Matrix and Observed Risk",
56
  "gateway": 0|1|2,
57
  "retry_strategy": 0|1,
58
+ "fraud_decision": 0(Allow)|1(Block)|2(3DS Challenge)|3(Manual Review)
59
  }}
60
+
61
+ ### IMPORTANT:
62
+ - Observations are PARTIAL. `observed_fraud_risk` is a noisy estimate.
63
+ - Gateway health signals are LAGGED by ~2 steps.
64
+ - `user_type` is hidden.
65
+ - Events (Spikes, Outages) are CORRELATED and have DURATION.
66
  """
67
  ).strip()
68
 
models.py CHANGED
@@ -25,7 +25,7 @@ class SmartpayenvAction(Action):
25
  """
26
  gateway: int = Field(default=0, description="0=GatewayA (cheap), 1=GatewayB (balanced), 2=GatewayC (premium)")
27
  retry_strategy: int = Field(default=0, description="0=No Retry, 1=Failover to next gateway on failure")
28
- fraud_decision: int = Field(default=0, description="0=Allow, 1=Block (end episode), 2=Challenge (3DS / MFA)")
29
 
30
 
31
  class SmartpayenvObservation(Observation):
@@ -70,9 +70,9 @@ class SmartpayenvObservation(Observation):
70
  )
71
 
72
  # ── Risk scores ───────────────────────────────────────────────────
73
- fraud_risk_score: float = Field(
74
  default=0.0,
75
- description="Continuous multi-factor fraud risk [0,1] (higher = more suspicious)"
76
  )
77
 
78
  # ── Episode tracking ──────────────────────────────────────────────
@@ -83,6 +83,7 @@ class SmartpayenvObservation(Observation):
83
  reward: float = Field(default=0.0, description="Combined step reward [0,1]")
84
  done: bool = Field(default=False, description="Episode done flag")
85
  chargeback_penalty_applied: float = Field(default=0.0, description="Penalty deducted this step from a past transaction chargeback")
 
86
 
87
  # Per-task scores — declared as first-class fields so openenv framework serializes them
88
  task_routing_score: float = Field(default=0.0, description="Routing efficacy score [0,1]")
 
25
  """
26
  gateway: int = Field(default=0, description="0=GatewayA (cheap), 1=GatewayB (balanced), 2=GatewayC (premium)")
27
  retry_strategy: int = Field(default=0, description="0=No Retry, 1=Failover to next gateway on failure")
28
+ fraud_decision: int = Field(default=0, description="0=Allow, 1=Block, 2=Challenge (3DS), 3=Manual Review (Delayed)")
29
 
30
 
31
  class SmartpayenvObservation(Observation):
 
70
  )
71
 
72
  # ── Risk scores ───────────────────────────────────────────────────
73
+ observed_fraud_risk: float = Field(
74
  default=0.0,
75
+ description="Noisy multi-factor fraud risk estimate [0,1] (true risk is hidden)"
76
  )
77
 
78
  # ── Episode tracking ──────────────────────────────────────────────
 
83
  reward: float = Field(default=0.0, description="Combined step reward [0,1]")
84
  done: bool = Field(default=False, description="Episode done flag")
85
  chargeback_penalty_applied: float = Field(default=0.0, description="Penalty deducted this step from a past transaction chargeback")
86
+ review_resolutions: list[dict] = Field(default_factory=list, description="List of resolved manual reviews this step: [{ 'amount': float, 'is_fraud': bool, 'outcome': 'accepted'|'rejected' }]")
87
 
88
  # Per-task scores — declared as first-class fields so openenv framework serializes them
89
  task_routing_score: float = Field(default=0.0, description="Routing efficacy score [0,1]")
scripts/generate_logs.py ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ import numpy as np
3
+ import os
4
+ from uuid import uuid4
5
+
6
+ def generate_logs(output_path="data/transactions_log.jsonl", num_transactions=5000):
7
+ rng = np.random.default_rng()
8
+ os.makedirs(os.path.dirname(output_path), exist_ok=True)
9
+
10
+ current_hour = 0
11
+ steps_per_hour = 100 # average density
12
+ active_spike_countdown = 0
13
+
14
+ with open(output_path, "w") as f:
15
+ for i in range(num_transactions):
16
+ # Advance time every ~100 transactions
17
+ if i % steps_per_hour == 0:
18
+ current_hour = (current_hour + 1) % 24
19
+
20
+ # Randomly start a fraud spike (correlated event)
21
+ if active_spike_countdown <= 0 and rng.random() < 0.005:
22
+ active_spike_countdown = rng.integers(20, 50)
23
+
24
+ # 1. Hour of day (Diurnal pattern)
25
+ hour = current_hour
26
+
27
+ # 2. Segment & MCC
28
+ segment = int(rng.choice([0, 1, 2], p=[0.25, 0.60, 0.15]))
29
+ mcc = int(rng.choice([0, 1, 2, 3, 4, 5], p=[0.3, 0.2, 0.1, 0.1, 0.1, 0.2]))
30
+
31
+ # 3. Fraud Risk with Correlation (Spikes)
32
+ is_night = (1 <= hour <= 5)
33
+ base_risk = {0: 0.02, 1: 0.05, 2: 0.15, 3: 0.08, 4: 0.25, 5: 0.12}[mcc]
34
+
35
+ risk_boost = 0.0
36
+ if active_spike_countdown > 0:
37
+ risk_boost = 0.4 # Persistent spike
38
+ active_spike_countdown -= 1
39
+ elif is_night:
40
+ risk_boost = 0.2
41
+
42
+ final_risk = base_risk + risk_boost + rng.uniform(-0.05, 0.05)
43
+ fraud_risk_score = float(np.clip(final_risk * {0: 1.8, 1: 1.0, 2: 0.3}[segment], 0.01, 0.99))
44
+
45
+ # 4. Transaction Details
46
+ amount = float(rng.lognormal(mean={0: 4.0, 1: 4.5, 2: 6.5, 3: 7.0, 4: 5.0, 5: 3.0}[mcc], sigma=0.8))
47
+ bin_category = int(rng.integers(0, 10))
48
+ is_international = bool(rng.random() < (0.4 if mcc == 3 else 0.15))
49
+
50
+ log_entry = {
51
+ "amount": amount,
52
+ "merchant_category": mcc,
53
+ "is_international": is_international,
54
+ "card_present": bool(rng.random() > 0.5),
55
+ "user_segment": segment,
56
+ "user_history_score": float(np.clip(rng.normal({0: 0.3, 1: 0.7, 2: 0.9}[segment], 0.15), 0.1, 1.0)),
57
+ "device_type": int(rng.choice([0, 1, 2], p=[0.5, 0.4, 0.1])),
58
+ "bin_category": bin_category,
59
+ "time_of_day": hour,
60
+ "transaction_velocity": float(np.clip(rng.random() * 0.2 + (0.5 if active_spike_countdown > 0 else 0.0), 0.1, 0.9)),
61
+ "fraud_risk_score": fraud_risk_score,
62
+ "event_marker": "fraud_spike" if active_spike_countdown > 0 else None
63
+ }
64
+ f.write(json.dumps(log_entry) + "\n")
65
+
66
+ if __name__ == "__main__":
67
+ generate_logs(num_transactions=5000)
68
+ print("Sequential logs with correlated events generated.")
server/SmartPayEnv_environment.py CHANGED
@@ -5,7 +5,7 @@
5
  # LICENSE file in the root directory of this source tree.
6
 
7
  """
8
- SmartPayEnv v3 — Advanced Fintech Reality Layer.
9
 
10
  High-fidelity benchmark for RL agents in the payment domain.
11
  Features: 3D Secure (3DS), Chargeback Delays, BIN Affinity, Dynamic Costs, & Cohorts.
@@ -25,8 +25,10 @@ except (ImportError, ValueError):
25
 
26
  try:
27
  from .graders import RoutingEfficacyGrader, FraudDetectionGrader, UserRetentionGrader
 
28
  except (ImportError, ValueError):
29
  from server.graders import RoutingEfficacyGrader, FraudDetectionGrader, UserRetentionGrader
 
30
 
31
 
32
  # ── Configuration Constants ────────────────────────────────────────────
@@ -69,6 +71,12 @@ class State:
69
  fraud_wave_drift: float = 0.0
70
  market_volatility: float = 0.0
71
  chargeback_queue: list = field(default_factory=list)
 
 
 
 
 
 
72
 
73
 
74
  class _GatewayState:
@@ -122,6 +130,8 @@ class SmartpayenvEnvironment(Environment):
122
  self.retention_grader = UserRetentionGrader()
123
  self._velocity_buffer = deque(maxlen=5)
124
  self.current_obs = None
 
 
125
 
126
  def _init_gateways(self) -> None:
127
  instability = self._cfg["instability"]
@@ -132,63 +142,37 @@ class SmartpayenvEnvironment(Environment):
132
  ]
133
 
134
  def _generate_transaction(self) -> SmartpayenvObservation:
135
- # 1. Advanced Diurnal Cycle (UTC)
136
- # Peak Fraud: 01:00 - 05:00. Peak Volume: 12:00 - 20:00
137
- hour = int(self._state.step_count % 24)
138
- is_night = (1 <= hour <= 5)
139
-
140
- # 2. User Segments (Cohorts)
141
- segment = int(self._rng.choice([0, 1, 2], p=[0.25, 0.60, 0.15])) # 0=New, 1=Existing, 2=Premium
142
-
143
- # Segment behavioral traits
144
- fraud_mult = {0: 1.8, 1: 1.0, 2: 0.3}[segment]
145
- history_mu = {0: 0.3, 1: 0.7, 2: 0.9}[segment]
146
-
147
- # 3. Correlated Merchant Categories (MCC)
148
- mcc = int(self._rng.choice([0, 1, 2, 3, 4, 5], p=[0.3, 0.2, 0.1, 0.1, 0.1, 0.2]))
149
-
150
- # MCC-Amount Correlation
151
- amount_mu = {0: 4.0, 1: 4.5, 2: 6.5, 3: 7.0, 4: 5.0, 5: 3.0}[mcc]
152
- amount = float(self._rng.lognormal(mean=amount_mu, sigma=0.8))
153
-
154
- # 4. Statistical Fraud Model
155
- wave_drift = self._state.fraud_wave_drift
156
- category_risk = {0: 0.02, 1: 0.05, 2: 0.15, 3: 0.08, 4: 0.25, 5: 0.12}[mcc]
157
-
158
- base_risk = self._cfg["fraud_base_rate"] + wave_drift + category_risk
159
- if is_night: base_risk += 0.25 # Night surge
160
-
161
- is_international = bool(self._rng.random() < (0.4 if mcc == 3 else 0.15))
162
- device_type = int(self._rng.choice([0, 1, 2], p=[0.5, 0.4, 0.1])) # 0=Mobile, 1=Web, 2=Unknown
163
-
164
- final_risk = base_risk + (0.15 if is_international else 0.0)
165
- final_risk += (0.2 if device_type == 2 else 0.0)
166
-
167
- fraud_risk_score = float(np.clip(final_risk * fraud_mult, 0.01, 0.99))
168
- user_history_score = float(np.clip(self._rng.normal(history_mu, 0.15), 0.1, 1.0))
169
 
170
- # 5. Other Transactional Features
171
- bin_category = int(self._rng.integers(0, 10))
172
- card_present = bool(self._rng.random() > 0.6 if is_night else 0.3)
173
-
174
- # Velocity and Fraud Risk (History Buffer)
175
- velocity = float(np.clip(self._rng.random() * 0.2 + (0.5 if is_night else 0.0), 0.1, 0.9))
176
 
177
  return SmartpayenvObservation(
178
- amount=amount,
179
- merchant_category=mcc,
180
- is_international=is_international,
181
- card_present=card_present,
182
  user_type=0,
183
- user_segment=segment,
184
- user_history_score=user_history_score,
185
- device_type=device_type,
186
- bin_category=bin_category,
187
- transaction_velocity=velocity,
188
- time_of_day=hour,
189
  gateway_success_rates=[g.current_rate for g in self._gateways],
190
  gateway_states=[g.state for g in self._gateways],
191
- fraud_risk_score=fraud_risk_score,
192
  previous_failures=self._state.consecutive_failures,
193
  difficulty=self._difficulty,
194
  reward=0.5,
@@ -198,46 +182,106 @@ class SmartpayenvEnvironment(Environment):
198
  task_retention_score=0.5,
199
  )
200
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
201
  def reset(self, difficulty: int = 0) -> SmartpayenvObservation:
202
  self._difficulty = int(np.clip(difficulty, 0, 2))
203
  self._cfg = DIFFICULTY_CONFIG[self._difficulty]
204
  self._state = State(episode_id=str(uuid4()), step_count=0)
 
 
205
  self._init_gateways()
206
  self.route_grader = RoutingEfficacyGrader()
207
  self.fraud_grader = FraudDetectionGrader()
208
  self.retention_grader = UserRetentionGrader(churn_rate=self._cfg["churn_rate"])
209
  self._velocity_buffer.clear()
210
  self.current_obs = self._generate_transaction()
 
 
211
  return self.current_obs
212
 
213
  def step(self, action: SmartpayenvAction) -> SmartpayenvObservation:
214
  self._state.step_count += 1
 
 
 
 
 
215
  if self.current_obs is None: self.reset()
216
 
217
  obs = self.current_obs
218
- assert obs is not None # Satisfy type checker
219
- # 0. Stochastic Reality Drift
220
- # Fraud Wave: base rate drifts every step
221
- if self._state.step_count % 5 == 0:
222
- drift = self._rng.normal(0, 0.05)
223
- self._state.fraud_wave_drift = np.clip(self._state.fraud_wave_drift + drift, -0.1, 0.2)
224
-
225
- # Systemic Volatility: 5% chance of market-wide degradation
226
- if self._rng.random() < 0.05:
227
- for g in self._gateways:
228
- if g.state == "normal":
229
- g.state = "degraded"
230
- g._countdown = int(self._rng.integers(4, 9))
231
- g.current_rate = g.current_rate * 0.7
 
 
 
 
 
 
 
 
 
 
 
 
 
232
 
233
  for gw in self._gateways: gw.step()
234
 
235
  # 1. 3DS / Action Logic
236
- is_fraud = (obs.fraud_risk_score >= 0.65)
237
- action_block = (action.fraud_decision == 1)
238
- action_3ds = (action.fraud_decision == 2)
 
239
 
240
- self.fraud_grader.add_step(action_block or action_3ds, is_fraud)
241
 
242
  done = False
243
  success = False
@@ -247,8 +291,19 @@ class SmartpayenvEnvironment(Environment):
247
  cb_penalty_this_step = 0.0
248
 
249
  if action_block:
250
- route_score = obs.fraud_risk_score if is_fraud else (obs.fraud_risk_score * 0.3)
251
  done = True
 
 
 
 
 
 
 
 
 
 
 
252
  else:
253
  gw_rates = [g.current_rate for g in self._gateways]
254
 
@@ -260,7 +315,7 @@ class SmartpayenvEnvironment(Environment):
260
  affinity = affinity * 0.15 # Harsh penalty for subpar routing
261
 
262
  # 3DS reduces remaining fraud risk by 90%
263
- eff_fraud_risk = obs.fraud_risk_score * (0.1 if action_3ds else 1.0)
264
  expected_outcome = gw_rates[gateway] * (1.0 - eff_fraud_risk) * affinity
265
  expected_outcome = float(np.clip(expected_outcome, 0.05, 1.0))
266
 
@@ -275,7 +330,7 @@ class SmartpayenvEnvironment(Environment):
275
  retries += 1
276
  gateway = (gateway + 1) % 3
277
  affinity = BIN_AFFINITY[gateway][obs.bin_category]
278
- expected_outcome = gw_rates[gateway] * (1.0 - obs.fraud_risk_score) * affinity
279
  success = bool(self._rng.random() < expected_outcome)
280
 
281
  # Dynamic Cost: % + flat
@@ -310,19 +365,38 @@ class SmartpayenvEnvironment(Environment):
310
  # Process maturation
311
  cb_amt: float = 0.0
312
  pending = []
313
- for mat, pen in self._state.chargeback_queue:
314
- if self._state.step_count >= mat:
315
- cb_amt = cb_amt + float(pen)
316
  else:
317
- pending.append((mat, pen))
318
  self._state.chargeback_queue = pending
319
 
320
- # Finalize
 
 
 
321
  self.current_obs = self._generate_transaction()
322
- self.current_obs.gateway_success_rates = [g.current_rate for g in self._gateways]
323
- self.current_obs.gateway_states = [g.state for g in self._gateways]
 
324
  self.current_obs.chargeback_penalty_applied = cb_amt
325
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
326
  if done or self._state.step_count >= 100: self.current_obs.done = True
327
 
328
  fs = self.fraud_grader.evaluate()
 
5
  # LICENSE file in the root directory of this source tree.
6
 
7
  """
8
+ SmartPayEnv — Advanced Fintech Reality Layer.
9
 
10
  High-fidelity benchmark for RL agents in the payment domain.
11
  Features: 3D Secure (3DS), Chargeback Delays, BIN Affinity, Dynamic Costs, & Cohorts.
 
25
 
26
  try:
27
  from .graders import RoutingEfficacyGrader, FraudDetectionGrader, UserRetentionGrader
28
+ from .utils import LogLoader
29
  except (ImportError, ValueError):
30
  from server.graders import RoutingEfficacyGrader, FraudDetectionGrader, UserRetentionGrader
31
+ from server.utils import LogLoader
32
 
33
 
34
  # ── Configuration Constants ────────────────────────────────────────────
 
71
  fraud_wave_drift: float = 0.0
72
  market_volatility: float = 0.0
73
  chargeback_queue: list = field(default_factory=list)
74
+ health_lag_buffer: deque = field(default_factory=lambda: deque(maxlen=3)) # 2-step lag
75
+ true_fraud_risk: float = 0.0
76
+ simulation_hour: int = 0
77
+ active_events: dict = field(default_factory=dict) # e.g. {"fraud_spike": 10, "outage": 5}
78
+ log_cursor: int = 0
79
+ review_queue: list = field(default_factory=list) # [{ 'step': int, 'is_fraud': bool, 'amount': float }]
80
 
81
 
82
  class _GatewayState:
 
130
  self.retention_grader = UserRetentionGrader()
131
  self._velocity_buffer = deque(maxlen=5)
132
  self.current_obs = None
133
+ self._log_loader = LogLoader()
134
+ self._pattern_queue = deque()
135
 
136
  def _init_gateways(self) -> None:
137
  instability = self._cfg["instability"]
 
142
  ]
143
 
144
  def _generate_transaction(self) -> SmartpayenvObservation:
145
+ # Check if we have a queued pattern to replay
146
+ if self._pattern_queue:
147
+ log_entry = self._pattern_queue.popleft()
148
+ else:
149
+ # Sample sequentially from logs to maintain temporal correlation
150
+ noise = {0: 0.05, 1: 0.15, 2: 0.3}[self._difficulty]
151
+ log_entry = self._log_loader.sample(index=self._state.log_cursor, noise_level=noise)
152
+ self._state.log_cursor += 1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
153
 
154
+ if log_entry is None:
155
+ # Fallback to random if logs fail (shouldn't happen)
156
+ return self._generate_fallback_transaction()
157
+
158
+ true_risk = float(log_entry["fraud_risk_score"])
159
+ self._state.true_fraud_risk = true_risk
160
 
161
  return SmartpayenvObservation(
162
+ amount=float(log_entry["amount"]),
163
+ merchant_category=int(log_entry["merchant_category"]),
164
+ is_international=bool(log_entry["is_international"]),
165
+ card_present=bool(log_entry["card_present"]),
166
  user_type=0,
167
+ user_segment=int(log_entry["user_segment"]),
168
+ user_history_score=float(log_entry["user_history_score"]),
169
+ device_type=int(log_entry["device_type"]),
170
+ bin_category=int(log_entry["bin_category"]),
171
+ transaction_velocity=float(log_entry["transaction_velocity"]),
172
+ time_of_day=int(log_entry["time_of_day"]),
173
  gateway_success_rates=[g.current_rate for g in self._gateways],
174
  gateway_states=[g.state for g in self._gateways],
175
+ observed_fraud_risk=self._get_noisy_risk(float(log_entry["fraud_risk_score"])),
176
  previous_failures=self._state.consecutive_failures,
177
  difficulty=self._difficulty,
178
  reward=0.5,
 
182
  task_retention_score=0.5,
183
  )
184
 
185
+ def _get_noisy_risk(self, true_risk: float) -> float:
186
+ """Adds Gaussian noise to the true risk score."""
187
+ noise = self._rng.normal(0, 0.1)
188
+ return float(np.clip(true_risk + noise, 0.01, 0.99))
189
+
190
+ def _generate_fallback_transaction(self) -> SmartpayenvObservation:
191
+ # Original logic as fallback
192
+ hour = int(self._state.step_count % 24)
193
+ segment = int(self._rng.choice([0, 1, 2], p=[0.25, 0.60, 0.15]))
194
+ mcc = int(self._rng.choice([0, 1, 2, 3, 4, 5]))
195
+ amount = float(self._rng.lognormal(mean=4.0, sigma=0.8))
196
+
197
+ self._state.true_fraud_risk = 0.1
198
+ return SmartpayenvObservation(
199
+ amount=amount,
200
+ merchant_category=mcc,
201
+ is_international=False,
202
+ card_present=True,
203
+ user_type=0,
204
+ user_segment=segment,
205
+ user_history_score=0.8,
206
+ device_type=0,
207
+ bin_category=0,
208
+ transaction_velocity=0.5,
209
+ time_of_day=hour,
210
+ gateway_success_rates=[0.9, 0.9, 0.9],
211
+ gateway_states=["normal", "normal", "normal"],
212
+ observed_fraud_risk=0.1,
213
+ previous_failures=0,
214
+ difficulty=self._difficulty,
215
+ reward=0.5,
216
+ done=False,
217
+ task_routing_score=0.5,
218
+ task_fraud_mcc_score=0.5,
219
+ task_retention_score=0.5,
220
+ )
221
+
222
  def reset(self, difficulty: int = 0) -> SmartpayenvObservation:
223
  self._difficulty = int(np.clip(difficulty, 0, 2))
224
  self._cfg = DIFFICULTY_CONFIG[self._difficulty]
225
  self._state = State(episode_id=str(uuid4()), step_count=0)
226
+ # Random initial cursor for variety, but then sequential within episode
227
+ self._state.log_cursor = self._rng.integers(0, 100000)
228
  self._init_gateways()
229
  self.route_grader = RoutingEfficacyGrader()
230
  self.fraud_grader = FraudDetectionGrader()
231
  self.retention_grader = UserRetentionGrader(churn_rate=self._cfg["churn_rate"])
232
  self._velocity_buffer.clear()
233
  self.current_obs = self._generate_transaction()
234
+ # Synchronize simulation clock with the log's starting hour
235
+ self._state.simulation_hour = self.current_obs.time_of_day
236
  return self.current_obs
237
 
238
  def step(self, action: SmartpayenvAction) -> SmartpayenvObservation:
239
  self._state.step_count += 1
240
+
241
+ # Advance hour every 20 steps
242
+ if self._state.step_count % 20 == 0:
243
+ self._state.simulation_hour = (self._state.simulation_hour + 1) % 24
244
+
245
  if self.current_obs is None: self.reset()
246
 
247
  obs = self.current_obs
248
+ assert obs is not None
249
+
250
+ # 0. Temporal Event Management
251
+ # Decay active events (Safer way to delete items)
252
+ self._state.active_events = {e: d - 1 for e, d in self._state.active_events.items() if d > 1}
253
+
254
+ # Randomly trigger a systemic gateway outage (Event Correlation)
255
+ if self._rng.random() < 0.01:
256
+ self._state.active_events["systemic_outage"] = self._rng.integers(5, 15)
257
+ # Force multiple gateways into "degraded" state
258
+ for gw in self._gateways:
259
+ if self._rng.random() < 0.7:
260
+ gw.state = "degraded"
261
+ gw._countdown = self._state.active_events["systemic_outage"]
262
+ gw.current_rate = gw.base_rate * 0.1
263
+
264
+ # 0. Gateway Health Lag Update
265
+ current_health = {
266
+ "rates": [g.current_rate for g in self._gateways],
267
+ "states": [g.state for g in self._gateways]
268
+ }
269
+ self._state.health_lag_buffer.append(current_health)
270
+
271
+ if self._state.step_count % 10 == 0 and self._rng.random() < 0.2:
272
+ # Inject a "Fraud Surge" pattern from logs
273
+ surge_logs = self._log_loader.get_pattern("fraud_surge", count=5)
274
+ self._pattern_queue.extend(surge_logs)
275
 
276
  for gw in self._gateways: gw.step()
277
 
278
  # 1. 3DS / Action Logic
279
+ is_fraud = (self._state.true_fraud_risk >= 0.65)
280
+ action_block = (action.fraud_decision == 1)
281
+ action_3ds = (action.fraud_decision == 2)
282
+ action_review = (action.fraud_decision == 3)
283
 
284
+ self.fraud_grader.add_step(action_block or action_3ds or action_review, is_fraud)
285
 
286
  done = False
287
  success = False
 
291
  cb_penalty_this_step = 0.0
292
 
293
  if action_block:
294
+ route_score = self._state.true_fraud_risk if is_fraud else (self._state.true_fraud_risk * 0.3)
295
  done = True
296
+ elif action_review:
297
+ # Manual Review: Costly but accurate delay
298
+ total_cost += 5.0 # High internal cost for human time
299
+ delay = self._rng.integers(10, 25)
300
+ self._state.review_queue.append({
301
+ 'maturation': self._state.step_count + delay,
302
+ 'is_fraud': is_fraud,
303
+ 'amount': obs.amount
304
+ })
305
+ route_score = 0.5 # Neutral immediate feedback
306
+ success = False # Held in review
307
  else:
308
  gw_rates = [g.current_rate for g in self._gateways]
309
 
 
315
  affinity = affinity * 0.15 # Harsh penalty for subpar routing
316
 
317
  # 3DS reduces remaining fraud risk by 90%
318
+ eff_fraud_risk = self._state.true_fraud_risk * (0.1 if action_3ds else 1.0)
319
  expected_outcome = gw_rates[gateway] * (1.0 - eff_fraud_risk) * affinity
320
  expected_outcome = float(np.clip(expected_outcome, 0.05, 1.0))
321
 
 
330
  retries += 1
331
  gateway = (gateway + 1) % 3
332
  affinity = BIN_AFFINITY[gateway][obs.bin_category]
333
+ expected_outcome = gw_rates[gateway] * (1.0 - self._state.true_fraud_risk) * affinity
334
  success = bool(self._rng.random() < expected_outcome)
335
 
336
  # Dynamic Cost: % + flat
 
365
  # Process maturation
366
  cb_amt: float = 0.0
367
  pending = []
368
+ for maturation_step, penalty_amount in self._state.chargeback_queue:
369
+ if self._state.step_count >= maturation_step:
370
+ cb_amt += float(penalty_amount)
371
  else:
372
+ pending.append((maturation_step, penalty_amount))
373
  self._state.chargeback_queue = pending
374
 
375
+ # 3. Apply Lagged Health to Next Observation
376
+ # Use first item in buffer for 2-step lag if buffer is full
377
+ lagged_health = self._state.health_lag_buffer[0] if len(self._state.health_lag_buffer) >= 3 else current_health
378
+
379
  self.current_obs = self._generate_transaction()
380
+ self.current_obs.time_of_day = self._state.simulation_hour
381
+ self.current_obs.gateway_success_rates = lagged_health["rates"]
382
+ self.current_obs.gateway_states = lagged_health["states"]
383
  self.current_obs.chargeback_penalty_applied = cb_amt
384
 
385
+ # Process and report matured Manual Reviews
386
+ matured_reviews = []
387
+ remaining_reviews = []
388
+ for r in self._state.review_queue:
389
+ if self._state.step_count >= r['maturation']:
390
+ matured_reviews.append({
391
+ 'amount': r['amount'],
392
+ 'is_fraud': r['is_fraud'],
393
+ 'outcome': 'rejected' if r['is_fraud'] else 'accepted'
394
+ })
395
+ else:
396
+ remaining_reviews.append(r)
397
+ self._state.review_queue = remaining_reviews
398
+ self.current_obs.review_resolutions = matured_reviews
399
+
400
  if done or self._state.step_count >= 100: self.current_obs.done = True
401
 
402
  fs = self.fraud_grader.evaluate()
server/graders.py CHANGED
@@ -102,7 +102,7 @@ class FraudDetectionGrader:
102
  (self.tn + self.fn)
103
  )
104
  if denominator == 0:
105
- return 0.1 # Fail — insufficient data signal
106
  mcc = numerator / denominator
107
  score = (mcc + 1.0) / 2.0 # Normalize [-1, 1] → [0, 1]
108
  return max(0.001, min(0.999, score))
 
102
  (self.tn + self.fn)
103
  )
104
  if denominator == 0:
105
+ return 0.5 # Neutral — insufficient data to compute MCC
106
  mcc = numerator / denominator
107
  score = (mcc + 1.0) / 2.0 # Normalize [-1, 1] → [0, 1]
108
  return max(0.001, min(0.999, score))
server/utils.py ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ import random
3
+ import os
4
+
5
+ class LogLoader:
6
+ def __init__(self, log_path="data/transactions_log.jsonl"):
7
+ self.log_path = log_path
8
+ self.logs = []
9
+ if os.path.exists(log_path):
10
+ with open(log_path, "r") as f:
11
+ for line in f:
12
+ self.logs.append(json.loads(line))
13
+ else:
14
+ print(f"Warning: Log file {log_path} not found.")
15
+
16
+ def sample(self, index=None, noise_level=0.05):
17
+ if not self.logs:
18
+ return None
19
+
20
+ if index is not None:
21
+ entry = self.logs[index % len(self.logs)].copy()
22
+ else:
23
+ entry = random.choice(self.logs).copy()
24
+
25
+ # Inject noise into float fields
26
+ if noise_level > 0:
27
+ for key in ["amount", "fraud_risk_score", "user_history_score", "transaction_velocity"]:
28
+ if key in entry:
29
+ noise = random.uniform(-noise_level, noise_level)
30
+ entry[key] = max(0.01, entry[key] * (1 + noise))
31
+
32
+ return entry
33
+
34
+ def get_pattern(self, pattern_type="fraud_surge", count=10):
35
+ """Returns a subset of logs matching a certain pattern."""
36
+ if not self.logs:
37
+ return []
38
+
39
+ if pattern_type == "fraud_surge":
40
+ # Filter for high fraud risk
41
+ candidates = [l for l in self.logs if l.get("fraud_risk_score", 0) > 0.5]
42
+ elif pattern_type == "premium_only":
43
+ candidates = [l for l in self.logs if l.get("user_segment") == 2]
44
+ else:
45
+ candidates = self.logs
46
+
47
+ if not candidates:
48
+ return [random.choice(self.logs) for _ in range(count)]
49
+
50
+ return [random.choice(candidates) for _ in range(count)]
tests/test_env_logs.py ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import sys
2
+ import os
3
+
4
+ # Add the root directory to sys.path
5
+ sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), "..")))
6
+
7
+ from server.SmartPayEnv_environment import SmartpayenvEnvironment
8
+ from models import SmartpayenvAction
9
+
10
+ def test_env():
11
+ env = SmartpayenvEnvironment()
12
+ obs = env.reset()
13
+ print(f"Initial Obs: Amount={obs.amount}, Segment={obs.user_segment}, FraudRisk={obs.fraud_risk_score}")
14
+
15
+ for i in range(20):
16
+ action = SmartpayenvAction(gateway=0, fraud_decision=0, retry_strategy=0)
17
+ obs = env.step(action)
18
+ print(f"Step {i+1}: Amount={obs.amount:.2f}, FraudRisk={obs.fraud_risk_score:.2f}, Hour={obs.time_of_day}")
19
+ if env._pattern_queue:
20
+ print(f" [Pattern Queued: {len(env._pattern_queue)} items remaining]")
21
+
22
+ if __name__ == "__main__":
23
+ test_env()
tests/test_partial_obs.py ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import sys
2
+ import os
3
+ import time
4
+
5
+ # Add the root directory to sys.path
6
+ sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), "..")))
7
+
8
+ from server.SmartPayEnv_environment import SmartpayenvEnvironment
9
+ from models import SmartpayenvAction
10
+
11
+ def test_partial_obs():
12
+ env = SmartpayenvEnvironment()
13
+ obs = env.reset()
14
+
15
+ print("--- STEP 0 (Initial) ---")
16
+ print(f"Observed Risk: {obs.observed_fraud_risk:.4f}")
17
+ print(f"True Risk (Hidden): {env._state.true_fraud_risk:.4f}")
18
+ print(f"Gateway Rates: {obs.gateway_success_rates}")
19
+
20
+ # Store initial rates
21
+ initial_rates = env.current_obs.gateway_success_rates.copy()
22
+
23
+ for i in range(1, 10):
24
+ # Force a change in gateway rates to see the lag
25
+ for g in env._gateways:
26
+ g.current_rate = min(1.0, g.current_rate + 0.01) # Slowly drift up
27
+
28
+ action = SmartpayenvAction(gateway=0, fraud_decision=0, retry_strategy=0)
29
+ obs = env.step(action)
30
+
31
+ print(f"\n--- STEP {i} ---")
32
+ print(f"Observed Risk: {obs.observed_fraud_risk:.4f} (True: {env._state.true_fraud_risk:.4f})")
33
+ print(f"Observed Health: {obs.gateway_success_rates}")
34
+ print(f"Hidden Real Health: {[g.current_rate for g in env._gateways]}")
35
+
36
+ if __name__ == "__main__":
37
+ test_partial_obs()
tests/{test_v3_features.py → test_reality_features.py} RENAMED
@@ -3,7 +3,7 @@ import sys
3
  import os
4
 
5
  # Add the root directory to path to import models and environment
6
- sys.path.append(os.path.dirname(os.path.abspath(__file__)))
7
 
8
  from server.SmartPayEnv_environment import SmartpayenvEnvironment
9
  from models import SmartpayenvAction
@@ -42,14 +42,14 @@ def test_3ds_mechanics():
42
  fraudulent_obs_found = False
43
  for _ in range(100):
44
  obs = env.reset(difficulty=1)
45
- if obs.fraud_risk_score > 0.7:
46
  fraudulent_obs_found = True
47
  # Case 1: Allow (High risk of failure)
48
  # Case 2: 3DS (High chance of success if no abandonment)
49
  action_3ds = SmartpayenvAction(gateway=2, retry_strategy=0, fraud_decision=2)
50
  next_obs = env.step(action_3ds)
51
  # 3DS doesn't end episode immediately (unless it's step 100)
52
- print(f" - 3DS on high risk ({obs.fraud_risk_score:.2f}) -> Reward: {next_obs.reward:.2f}")
53
  break
54
 
55
  if not fraudulent_obs_found:
@@ -69,7 +69,7 @@ def test_chargeback_delay():
69
 
70
  for i in range(1, 101):
71
  # Find a fraud
72
- is_fraud = obs.fraud_risk_score >= 0.65
73
 
74
  if is_fraud and not cb_queued:
75
  # Allow it
 
3
  import os
4
 
5
  # Add the root directory to path to import models and environment
6
+ sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
7
 
8
  from server.SmartPayEnv_environment import SmartpayenvEnvironment
9
  from models import SmartpayenvAction
 
42
  fraudulent_obs_found = False
43
  for _ in range(100):
44
  obs = env.reset(difficulty=1)
45
+ if obs.observed_fraud_risk > 0.7:
46
  fraudulent_obs_found = True
47
  # Case 1: Allow (High risk of failure)
48
  # Case 2: 3DS (High chance of success if no abandonment)
49
  action_3ds = SmartpayenvAction(gateway=2, retry_strategy=0, fraud_decision=2)
50
  next_obs = env.step(action_3ds)
51
  # 3DS doesn't end episode immediately (unless it's step 100)
52
+ print(f" - 3DS on high risk ({obs.observed_fraud_risk:.2f}) -> Reward: {next_obs.reward:.2f}")
53
  break
54
 
55
  if not fraudulent_obs_found:
 
69
 
70
  for i in range(1, 101):
71
  # Find a fraud
72
+ is_fraud = obs.observed_fraud_risk >= 0.65
73
 
74
  if is_fraud and not cb_queued:
75
  # Allow it
tests/test_temporal.py ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import requests
2
+ import json
3
+ import time
4
+
5
+ URL = "http://localhost:7860"
6
+
7
+ def test_temporal():
8
+ # 1. Reset
9
+ res = requests.post(f"{URL}/reset", json={"difficulty": 1})
10
+ obs = res.json().get("observation")
11
+ last_hour = obs.get("time_of_day")
12
+
13
+ print(f"Initial Hour: {last_hour}")
14
+
15
+ correlated_failures = 0
16
+ high_velocity_count = 0
17
+
18
+ for i in range(100):
19
+ # Action doesn't matter much for this test
20
+ res = requests.post(f"{URL}/step", json={"action": {"gateway": 0, "fraud_decision": 0, "retry_strategy": 0}})
21
+ data = res.json()
22
+ obs = data.get("observation")
23
+
24
+ hour = obs.get("time_of_day")
25
+ states = obs.get("gateway_states")
26
+
27
+ # Check hour progression
28
+ if hour != last_hour:
29
+ print(f"Hour advanced to {hour}")
30
+ last_hour = hour
31
+
32
+ # Check correlation (Systemic Outage)
33
+ down_count = sum(1 for s in states if s != "normal")
34
+ if down_count >= 2:
35
+ correlated_failures += 1
36
+ print(f"Step {i}: Cluster failure detected! States: {states}")
37
+
38
+ # Velocity might be high during fraud spikes
39
+ # Actually transaction_velocity is in observation? Let's check model.py
40
+ # No, it's not in observation yet. Let's check models.py
41
+
42
+ print(f"Correlated failures detected: {correlated_failures}")
43
+
44
+ if __name__ == "__main__":
45
+ try:
46
+ test_temporal()
47
+ except Exception as e:
48
+ print(f"Failed to connect to server: {e}. Make sure it is running.")