Spaces:
Running
Running
| cff-version: 1.2.0 | |
| title: "Chakravyuh: A Multi-Agent RL Environment for Indian UPI Fraud Detection" | |
| message: "If you use this environment, benchmark, or trained adapter, please cite it as below." | |
| type: software | |
| authors: | |
| - family-names: Pardeshi | |
| given-names: Ujjwal | |
| email: ujjwal.pardeshi@riamona.com | |
| - family-names: Kadam | |
| given-names: Omkar | |
| date-released: 2026-04-26 | |
| url: "https://github.com/UjjwalPardeshi/Chakravyuh" | |
| repository-code: "https://github.com/UjjwalPardeshi/Chakravyuh" | |
| license: MIT | |
| keywords: | |
| - reinforcement-learning | |
| - multi-agent | |
| - fraud-detection | |
| - openenv | |
| - upi | |
| - india | |
| - llm | |
| - grpo | |
| - lora | |
| - scalable-oversight | |
| abstract: >- | |
| Chakravyuh is a five-agent OpenEnv-compliant reinforcement learning | |
| environment for training Large Language Models to detect Indian UPI | |
| fraud. The Analyzer agent (Qwen2.5-7B + LoRA) observes scripted | |
| Scammer-Victim dialogues and must output a calibrated suspicion score | |
| with a justified explanation, while a Bank Monitor and Regulator | |
| provide cross-modal oversight. A composable eight-rubric reward | |
| (detection, missed-scam penalty, false-positive penalty, calibration, | |
| explanation quality, signal accuracy, format adherence, length control) | |
| is designed to be hard to game; v2 of the trained adapter reduces | |
| false-positive rate by approximately 5x relative to a reward-hacked v1 | |
| baseline on a 175-scenario Indian-grounded benchmark. | |
| preferred-citation: | |
| type: software | |
| title: "Chakravyuh: A Multi-Agent RL Environment for Indian UPI Fraud Detection" | |
| authors: | |
| - family-names: Pardeshi | |
| given-names: Ujjwal | |
| - family-names: Kadam | |
| given-names: Omkar | |
| year: 2026 | |
| url: "https://github.com/UjjwalPardeshi/Chakravyuh" | |