ujjwalpardeshi commited on
Commit
c4e8f35
Β·
verified Β·
1 Parent(s): 2da879a

docs: replace auto-generated model card with Chakravyuh-specific one

Browse files
Files changed (1) hide show
  1. README.md +182 -48
README.md CHANGED
@@ -1,73 +1,207 @@
1
  ---
2
- base_model: unsloth/qwen2.5-7b-instruct-unsloth-bnb-4bit
 
 
 
 
 
 
 
 
 
3
  library_name: peft
4
- model_name: analyzer_lora_v2
5
- tags:
6
- - base_model:adapter:unsloth/qwen2.5-7b-instruct-unsloth-bnb-4bit
7
- - grpo
8
- - lora
9
- - transformers
10
- - trl
11
- - unsloth
12
- licence: license
13
  pipeline_tag: text-generation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  ---
15
 
16
- # Model Card for analyzer_lora_v2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
- This model is a fine-tuned version of [unsloth/qwen2.5-7b-instruct-unsloth-bnb-4bit](https://huggingface.co/unsloth/qwen2.5-7b-instruct-unsloth-bnb-4bit).
19
- It has been trained using [TRL](https://github.com/huggingface/trl).
 
 
 
 
 
 
 
20
 
21
  ## Quick start
22
 
23
  ```python
24
- from transformers import pipeline
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
 
26
- question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
27
- generator = pipeline("text-generation", model="None", device="cuda")
28
- output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
29
- print(output["generated_text"])
 
 
30
  ```
31
 
32
- ## Training procedure
33
 
34
-
 
 
 
 
 
 
 
 
35
 
 
36
 
37
- This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).
38
 
39
- ### Framework versions
 
 
 
 
40
 
41
- - PEFT 0.18.1
42
- - TRL: 0.24.0
43
- - Transformers: 5.5.0
44
- - Pytorch: 2.10.0
45
- - Datasets: 4.3.0
46
- - Tokenizers: 0.22.2
47
 
48
- ## Citations
49
 
50
- Cite GRPO as:
 
 
 
 
 
51
 
52
  ```bibtex
53
- @article{shao2024deepseekmath,
54
- title = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
55
- author = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
56
- year = 2024,
57
- eprint = {arXiv:2402.03300},
58
  }
59
-
60
  ```
61
 
62
- Cite TRL as:
63
-
64
- ```bibtex
65
- @misc{vonwerra2022trl,
66
- title = {{TRL: Transformer Reinforcement Learning}},
67
- author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
68
- year = 2020,
69
- journal = {GitHub repository},
70
- publisher = {GitHub},
71
- howpublished = {\url{https://github.com/huggingface/trl}}
72
- }
73
- ```
 
1
  ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ - hi
6
+ - ta
7
+ - te
8
+ - kn
9
+ - bn
10
+ - mr
11
+ base_model: Qwen/Qwen2.5-7B-Instruct
12
  library_name: peft
 
 
 
 
 
 
 
 
 
13
  pipeline_tag: text-generation
14
+ tags:
15
+ - lora
16
+ - peft
17
+ - grpo
18
+ - trl
19
+ - unsloth
20
+ - fraud-detection
21
+ - upi
22
+ - india
23
+ - multi-agent
24
+ - openenv
25
+ - scalable-oversight
26
+ datasets:
27
+ - ujjwalpardeshi/chakravyuh-bench-v0
28
+ metrics:
29
+ - f1
30
+ - precision
31
+ - recall
32
+ model-index:
33
+ - name: chakravyuh-analyzer-lora-v2
34
+ results:
35
+ - task:
36
+ type: text-classification
37
+ name: Indian UPI Fraud Detection (Chakravyuh bench-v0)
38
+ dataset:
39
+ name: chakravyuh-bench-v0
40
+ type: custom
41
+ metrics:
42
+ - name: Detection (recall)
43
+ type: recall
44
+ value: 0.993
45
+ - name: False Positive Rate
46
+ type: fpr
47
+ value: 0.067
48
+ - name: Precision
49
+ type: precision
50
+ value: 0.986
51
+ - name: F1
52
+ type: f1
53
+ value: 0.99
54
  ---
55
 
56
+ # Chakravyuh Analyzer β€” LoRA v2
57
+
58
+ LoRA adapter for **Qwen/Qwen2.5-7B-Instruct**, post-trained with TRL's GRPO on the [Chakravyuh](https://github.com/UjjwalPardeshi/Chakravyuh) multi-agent Indian UPI fraud-detection environment.
59
+
60
+ The Analyzer's job: read a multi-turn dialogue between a (scripted) Scammer and Victim and output a calibrated suspicion score plus a justified explanation, in real time, on the victim's device. This adapter is the **v2 of two** Chakravyuh trained adapters and is the **honest one** β€” see "v1 β†’ v2 story" below.
61
+
62
+ ## Quick numbers (full results in `logs/eval_v2.json` of the GitHub repo)
63
+
64
+ | Metric | v1 (reward-hacked) | **v2 (this adapter)** |
65
+ |---|---|---|
66
+ | Detection rate | 100.0% | **99.3%** |
67
+ | False positive rate | 36.0% | **6.7%** (5Γ— better) |
68
+ | F1 | 0.96 | **0.99** |
69
+ | Bench size | 135 | 174 evaluated (175 total, 1 skipped) |
70
+
71
+ ### Per-difficulty detection (scams only, n=144)
72
+
73
+ | Difficulty | n | Detection |
74
+ |---|---|---|
75
+ | Easy | 26 | 100% |
76
+ | Medium | 66 | 100% |
77
+ | Hard | 18 | 100% |
78
+ | Novel | 34 | 97% |
79
+
80
+ The dip on `novel` (post-2024 attack patterns) is the small honest crack that confirms the model is not collapsing to "always flag."
81
+
82
+ ## v1 β†’ v2 story (the reason this adapter exists)
83
+
84
+ v1 hit `detection=100% / FPR=36%` β€” a textbook **reward-hacking fingerprint**. The model had learned to flag *everything* and then defend the over-flagging with plausible-sounding reasoning. The reward components were:
85
+
86
+ - Detection (+1 correct / -0.5 wrong)
87
+ - False-positive penalty (βˆ’0.3) ← too light
88
+ - Format reward (+0.15) ← paid even when prediction was wrong
89
+ - Calibration (Γ—0.3 for benign) ← too weak on benign side
90
+ - Explanation (Γ—0.4)
91
 
92
+ After diagnosing the hack, three principled changes were applied for v2:
93
+
94
+ 1. **FP penalty βˆ’0.3 β†’ βˆ’0.8** β€” over-flagging is now expensive
95
+ 2. **Format reward DENIED on benign-flagged-as-scam** β€” closes the "lazy over-flag" shortcut
96
+ 3. **Benign calibration weight 0.3 β†’ 0.5** β€” stronger gradient toward score ≀ 0.2 on benign
97
+
98
+ Plus the KL anchor was tightened (`Ξ² = 0.08 β†’ 0.15`) so the model can't drift far from the base distribution under the new reward shape.
99
+
100
+ Result: detection barely moved (100% β†’ 99%), but FPR dropped 5Γ— (36% β†’ 6.7%). That asymmetric improvement is the signature of the model actually learning the task instead of gaming the reward.
101
 
102
  ## Quick start
103
 
104
  ```python
105
+ from transformers import AutoTokenizer, AutoModelForCausalLM
106
+ from peft import PeftModel
107
+ import torch
108
+
109
+ base_model_id = "Qwen/Qwen2.5-7B-Instruct"
110
+ adapter_id = "ujjwalpardeshi/chakravyuh-analyzer-lora-v2"
111
+
112
+ tokenizer = AutoTokenizer.from_pretrained(base_model_id)
113
+ base = AutoModelForCausalLM.from_pretrained(
114
+ base_model_id,
115
+ torch_dtype=torch.bfloat16,
116
+ device_map="auto",
117
+ )
118
+ model = PeftModel.from_pretrained(base, adapter_id)
119
+ model.eval()
120
+
121
+ system_prompt = (
122
+ "You are Chakravyuh's Behavioral Analyzer, a fraud detection AI deployed "
123
+ "on-device to monitor real-time chat for Indian UPI scam patterns. "
124
+ "Output strict JSON with `score` in [0,1], `signals` from the taxonomy, "
125
+ "and `explanation`."
126
+ )
127
+
128
+ user_prompt = (
129
+ "Scammer messages:\n"
130
+ "Urgent! Your bank account will be frozen. Share OTP to verify identity.\n\n"
131
+ "Analyze these messages. Return strict JSON only."
132
+ )
133
+
134
+ messages = [
135
+ {"role": "system", "content": system_prompt},
136
+ {"role": "user", "content": user_prompt},
137
+ ]
138
+ prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
139
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
140
+
141
+ with torch.no_grad():
142
+ out = model.generate(
143
+ **inputs,
144
+ max_new_tokens=160,
145
+ do_sample=False,
146
+ temperature=0.0,
147
+ pad_token_id=tokenizer.eos_token_id,
148
+ )
149
+ response = tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
150
+ print(response)
151
+ ```
152
+
153
+ Expected output (JSON):
154
 
155
+ ```json
156
+ {
157
+ "score": 0.95,
158
+ "signals": ["urgency", "info_request", "impersonation"],
159
+ "explanation": "Asks for OTP with urgency pressure from a self-claimed bank agent; matches OTP-theft scam pattern."
160
+ }
161
  ```
162
 
163
+ ## Training details
164
 
165
+ - **Base model:** Qwen/Qwen2.5-7B-Instruct (4-bit Unsloth quantization for training, bf16 inference)
166
+ - **LoRA rank:** 64
167
+ - **LoRA alpha:** 128
168
+ - **KL anchor (Ξ²):** 0.15
169
+ - **Training corpus:** 619 examples (456 scam + 204 benign templates, soft-leakage filtered against the test set; see `training/grpo_analyzer.py:_filter_soft_leakage`)
170
+ - **Algorithm:** GRPO via TRL
171
+ - **Steps:** 619 (1 full epoch over the corpus)
172
+ - **Reward function:** Composable 5-rubric system (detection, FP penalty, missed-scam penalty, calibration, explanation quality)
173
+ - **Hardware:** Single A100-80GB (Colab Pro+)
174
 
175
+ `trainer_state.json` (full training trajectory) is at [logs/v2_trainer_state.json](https://github.com/UjjwalPardeshi/Chakravyuh/blob/main/logs/v2_trainer_state.json) in the source repo.
176
 
177
+ ## Limitations
178
 
179
+ 1. **Small benign sample (n=30 evaluated, 1 of 31 in bench skipped due to empty text).** Wilson 95% CI on FPR is approximately [1.9%, 21.3%]. We stand behind the "5Γ— FPR reduction vs v1" claim (statistically real) but not the precise "6.7%" figure as a tight estimate.
180
+ 2. **Single-seed training.** Multi-seed retrains are deferred to v3.
181
+ 3. **Bench is a proxy.** 175 curated scenarios do not span real-world Indian fraud diversity. Production performance will be lower.
182
+ 4. **One epoch over 619 templates.** More data + more epochs are deferred to v3.
183
+ 5. **English-dominant training.** Multi-language detection numbers (Tamil, Telugu, etc.) require per-language eval β€” not yet measured at the time of writing.
184
 
185
+ See [docs/RESPONSIBLE_USE.md](https://github.com/UjjwalPardeshi/Chakravyuh/blob/main/docs/RESPONSIBLE_USE.md) for intended use and dual-use considerations.
 
 
 
 
 
186
 
187
+ ## Links
188
 
189
+ - **GitHub:** <https://github.com/UjjwalPardeshi/Chakravyuh>
190
+ - **OpenEnv Space (live env):** <https://huggingface.co/spaces/ujjwalpardeshi/chakravyuh>
191
+ - **Bench dataset:** <https://huggingface.co/datasets/ujjwalpardeshi/chakravyuh-bench-v0> (release pending)
192
+ - **Hackathon:** Meta PyTorch OpenEnv Hackathon 2026, Bangalore
193
+
194
+ ## Citation
195
 
196
  ```bibtex
197
+ @software{pardeshi2026chakravyuh,
198
+ title = {Chakravyuh: A Multi-Agent RL Environment for Indian UPI Fraud Detection},
199
+ author = {Pardeshi, Ujjwal},
200
+ year = {2026},
201
+ url = {https://github.com/UjjwalPardeshi/Chakravyuh}
202
  }
 
203
  ```
204
 
205
+ ## License
206
+
207
+ MIT β€” see [LICENSE](https://github.com/UjjwalPardeshi/Chakravyuh/blob/main/LICENSE) in the source repo.