s23deepak commited on
Commit
b644b23
·
verified ·
1 Parent(s): 9b731b7

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +77 -0
README.md ADDED
@@ -0,0 +1,77 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # GrandgemMa — Gemma 4 Scam Detection Eval & Fine-Tune Kit
2
+
3
+ > **Goal:** Zero-shot test `google/gemma-4-E2B-it` (2B text) on real scam-call transcripts.
4
+ > If accuracy < 90 % or F1(SCAM) < 85 % → fine-tune with Unsloth 4-bit LoRA.
5
+
6
+ ## Quick Links
7
+
8
+ | Artifact | Path |
9
+ |---|---|
10
+ | Zero-shot eval script | `eval_zero_shot.py` |
11
+ | Unsloth SFT trainer | `train_sft_unsloth.py` |
12
+ | Dataset formatter | `format_dataset.py` |
13
+ | Decision rubric | below |
14
+
15
+ ## Datasets Used
16
+
17
+ - **Primary:** [`BothBosu/scam-dialogue`](https://huggingface.co/datasets/BothBosu/scam-dialogue) — 800+ synthetic scam/legit call transcripts, labeled `dialogue` + `label` (1=SCAM, 0=LEGIT).
18
+ - **Secondary (optional):** [`BothBosu/Scammer-Conversation`](https://huggingface.co/datasets/BothBosu/Scammer-Conversation) — extra mixed conversations.
19
+
20
+ ## 1. Zero-Shot Evaluation
21
+
22
+ ```bash
23
+ # Quick smoke test (100 rows)
24
+ python eval_zero_shot.py --limit 100
25
+
26
+ # Full test split (~400 rows)
27
+ python eval_zero_shot.py --limit -1
28
+
29
+ # CPU-only fallback
30
+ python eval_zero_shot.py --device cpu --dtype fp32 --limit 20
31
+ ```
32
+
33
+ **Output:** `results_zero_shot.json` + console report with accuracy / precision / recall / F1 / confusion matrix.
34
+
35
+ ## 2. Decision Rubric
36
+
37
+ | Condition | Verdict | Action |
38
+ |---|---|---|
39
+ | Accuracy ≥ 90 % **and** F1(SCAM) ≥ 85 % | ✅ PASS | Base model is strong enough. Fine-tuning optional. |
40
+ | Accuracy 75–89 % **or** F1(SCAM) 70–84 % | ⚠️ MARGINAL | **Fine-tune recommended.** Expected uplift +5–15 pp. |
41
+ | Accuracy < 75 % **or** F1(SCAM) < 70 % | ❌ FAIL | **Fine-tune REQUIRED.** Run `train_sft_unsloth.py`. |
42
+
43
+ > **Why these thresholds?** For elder-scam defense, missing a scam (false negative) is catastrophic. High recall on SCAM class is mandatory.
44
+
45
+ ## 3. Fine-Tuning (Unsloth SFT)
46
+
47
+ ```bash
48
+ # Install deps
49
+ pip install unsloth transformers datasets trl peft accelerate
50
+
51
+ # Train & push to HF Hub
52
+ python train_sft_unsloth.py \
53
+ --output grandgemma-scam-sft \
54
+ --push_to_hub s23deepak/grandgemma-scam-sft
55
+ ```
56
+
57
+ **Hardware:** Kaggle T4×2 (free) or any single GPU ≥ 16 GB VRAM.
58
+ **Config:** 4-bit + LoRA r=16, 3 epochs, lr=2e-4, batch=2, grad_accum=4.
59
+ **Expected time:** ~3–5 min / epoch on T4×2.
60
+
61
+ ## 4. Re-Eval After Fine-Tuning
62
+
63
+ After training, run the same `eval_zero_shot.py` but point `--model` at your fine-tuned checkpoint:
64
+
65
+ ```bash
66
+ python eval_zero_shot.py \
67
+ --model s23deepak/grandgemma-scam-sft \
68
+ --limit -1
69
+ ```
70
+
71
+ Compare the delta in `accuracy`, `recall_scam`, and `f1_scam`.
72
+
73
+ ## 5. Team Notes
74
+
75
+ - This repo is **evaluation-only** — no app code. App code lives in your monorepo (`/android`, `/ios`, `/extensions`, `/portal`).
76
+ - Fine-tuned weights produced here should be quantized to `.litertlm` for on-device Android inference (Stream A) and converted for iOS/browser WebGPU (Stream B).
77
+ - Track all runs in a spreadsheet: run_id | model | dataset | accuracy | f1_scam | notes.