ujjwalpardeshi's picture
docs: replace auto-generated model card with Chakravyuh-specific one
c4e8f35 verified
---
license: mit
language:
- en
- hi
- ta
- te
- kn
- bn
- mr
base_model: Qwen/Qwen2.5-7B-Instruct
library_name: peft
pipeline_tag: text-generation
tags:
- lora
- peft
- grpo
- trl
- unsloth
- fraud-detection
- upi
- india
- multi-agent
- openenv
- scalable-oversight
datasets:
- ujjwalpardeshi/chakravyuh-bench-v0
metrics:
- f1
- precision
- recall
model-index:
- name: chakravyuh-analyzer-lora-v2
results:
- task:
type: text-classification
name: Indian UPI Fraud Detection (Chakravyuh bench-v0)
dataset:
name: chakravyuh-bench-v0
type: custom
metrics:
- name: Detection (recall)
type: recall
value: 0.993
- name: False Positive Rate
type: fpr
value: 0.067
- name: Precision
type: precision
value: 0.986
- name: F1
type: f1
value: 0.99
---
# Chakravyuh Analyzer β€” LoRA v2
LoRA adapter for **Qwen/Qwen2.5-7B-Instruct**, post-trained with TRL's GRPO on the [Chakravyuh](https://github.com/UjjwalPardeshi/Chakravyuh) multi-agent Indian UPI fraud-detection environment.
The Analyzer's job: read a multi-turn dialogue between a (scripted) Scammer and Victim and output a calibrated suspicion score plus a justified explanation, in real time, on the victim's device. This adapter is the **v2 of two** Chakravyuh trained adapters and is the **honest one** β€” see "v1 β†’ v2 story" below.
## Quick numbers (full results in `logs/eval_v2.json` of the GitHub repo)
| Metric | v1 (reward-hacked) | **v2 (this adapter)** |
|---|---|---|
| Detection rate | 100.0% | **99.3%** |
| False positive rate | 36.0% | **6.7%** (5Γ— better) |
| F1 | 0.96 | **0.99** |
| Bench size | 135 | 174 evaluated (175 total, 1 skipped) |
### Per-difficulty detection (scams only, n=144)
| Difficulty | n | Detection |
|---|---|---|
| Easy | 26 | 100% |
| Medium | 66 | 100% |
| Hard | 18 | 100% |
| Novel | 34 | 97% |
The dip on `novel` (post-2024 attack patterns) is the small honest crack that confirms the model is not collapsing to "always flag."
## v1 β†’ v2 story (the reason this adapter exists)
v1 hit `detection=100% / FPR=36%` β€” a textbook **reward-hacking fingerprint**. The model had learned to flag *everything* and then defend the over-flagging with plausible-sounding reasoning. The reward components were:
- Detection (+1 correct / -0.5 wrong)
- False-positive penalty (βˆ’0.3) ← too light
- Format reward (+0.15) ← paid even when prediction was wrong
- Calibration (Γ—0.3 for benign) ← too weak on benign side
- Explanation (Γ—0.4)
After diagnosing the hack, three principled changes were applied for v2:
1. **FP penalty βˆ’0.3 β†’ βˆ’0.8** β€” over-flagging is now expensive
2. **Format reward DENIED on benign-flagged-as-scam** β€” closes the "lazy over-flag" shortcut
3. **Benign calibration weight 0.3 β†’ 0.5** β€” stronger gradient toward score ≀ 0.2 on benign
Plus the KL anchor was tightened (`Ξ² = 0.08 β†’ 0.15`) so the model can't drift far from the base distribution under the new reward shape.
Result: detection barely moved (100% β†’ 99%), but FPR dropped 5Γ— (36% β†’ 6.7%). That asymmetric improvement is the signature of the model actually learning the task instead of gaming the reward.
## Quick start
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
base_model_id = "Qwen/Qwen2.5-7B-Instruct"
adapter_id = "ujjwalpardeshi/chakravyuh-analyzer-lora-v2"
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
base = AutoModelForCausalLM.from_pretrained(
base_model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
model = PeftModel.from_pretrained(base, adapter_id)
model.eval()
system_prompt = (
"You are Chakravyuh's Behavioral Analyzer, a fraud detection AI deployed "
"on-device to monitor real-time chat for Indian UPI scam patterns. "
"Output strict JSON with `score` in [0,1], `signals` from the taxonomy, "
"and `explanation`."
)
user_prompt = (
"Scammer messages:\n"
"Urgent! Your bank account will be frozen. Share OTP to verify identity.\n\n"
"Analyze these messages. Return strict JSON only."
)
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
out = model.generate(
**inputs,
max_new_tokens=160,
do_sample=False,
temperature=0.0,
pad_token_id=tokenizer.eos_token_id,
)
response = tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)
```
Expected output (JSON):
```json
{
"score": 0.95,
"signals": ["urgency", "info_request", "impersonation"],
"explanation": "Asks for OTP with urgency pressure from a self-claimed bank agent; matches OTP-theft scam pattern."
}
```
## Training details
- **Base model:** Qwen/Qwen2.5-7B-Instruct (4-bit Unsloth quantization for training, bf16 inference)
- **LoRA rank:** 64
- **LoRA alpha:** 128
- **KL anchor (Ξ²):** 0.15
- **Training corpus:** 619 examples (456 scam + 204 benign templates, soft-leakage filtered against the test set; see `training/grpo_analyzer.py:_filter_soft_leakage`)
- **Algorithm:** GRPO via TRL
- **Steps:** 619 (1 full epoch over the corpus)
- **Reward function:** Composable 5-rubric system (detection, FP penalty, missed-scam penalty, calibration, explanation quality)
- **Hardware:** Single A100-80GB (Colab Pro+)
`trainer_state.json` (full training trajectory) is at [logs/v2_trainer_state.json](https://github.com/UjjwalPardeshi/Chakravyuh/blob/main/logs/v2_trainer_state.json) in the source repo.
## Limitations
1. **Small benign sample (n=30 evaluated, 1 of 31 in bench skipped due to empty text).** Wilson 95% CI on FPR is approximately [1.9%, 21.3%]. We stand behind the "5Γ— FPR reduction vs v1" claim (statistically real) but not the precise "6.7%" figure as a tight estimate.
2. **Single-seed training.** Multi-seed retrains are deferred to v3.
3. **Bench is a proxy.** 175 curated scenarios do not span real-world Indian fraud diversity. Production performance will be lower.
4. **One epoch over 619 templates.** More data + more epochs are deferred to v3.
5. **English-dominant training.** Multi-language detection numbers (Tamil, Telugu, etc.) require per-language eval β€” not yet measured at the time of writing.
See [docs/RESPONSIBLE_USE.md](https://github.com/UjjwalPardeshi/Chakravyuh/blob/main/docs/RESPONSIBLE_USE.md) for intended use and dual-use considerations.
## Links
- **GitHub:** <https://github.com/UjjwalPardeshi/Chakravyuh>
- **OpenEnv Space (live env):** <https://huggingface.co/spaces/ujjwalpardeshi/chakravyuh>
- **Bench dataset:** <https://huggingface.co/datasets/ujjwalpardeshi/chakravyuh-bench-v0> (release pending)
- **Hackathon:** Meta PyTorch OpenEnv Hackathon 2026, Bangalore
## Citation
```bibtex
@software{pardeshi2026chakravyuh,
title = {Chakravyuh: A Multi-Agent RL Environment for Indian UPI Fraud Detection},
author = {Pardeshi, Ujjwal},
year = {2026},
url = {https://github.com/UjjwalPardeshi/Chakravyuh}
}
```
## License
MIT β€” see [LICENSE](https://github.com/UjjwalPardeshi/Chakravyuh/blob/main/LICENSE) in the source repo.