File size: 5,617 Bytes
e04a879
58a043d
 
 
 
 
 
 
 
 
 
 
 
 
e04a879
 
58a043d
e04a879
58a043d
e04a879
58a043d
e04a879
58a043d
e04a879
58a043d
e04a879
58a043d
 
e04a879
58a043d
e04a879
58a043d
 
 
 
 
e04a879
58a043d
 
e04a879
58a043d
 
 
 
 
 
e04a879
58a043d
e04a879
58a043d
 
 
 
e04a879
58a043d
e04a879
58a043d
e04a879
58a043d
 
e04a879
58a043d
 
 
e04a879
58a043d
 
 
 
e04a879
58a043d
 
e04a879
58a043d
 
e04a879
58a043d
 
 
 
e04a879
58a043d
e04a879
58a043d
 
 
 
 
 
e04a879
58a043d
e04a879
58a043d
e04a879
58a043d
 
 
 
 
 
 
 
 
 
 
 
 
 
e04a879
58a043d
e04a879
58a043d
 
 
e04a879
58a043d
e04a879
58a043d
e04a879
58a043d
 
 
 
 
e04a879
58a043d
e04a879
58a043d
e04a879
58a043d
 
 
 
 
 
e04a879
58a043d
e04a879
58a043d
 
 
 
 
 
 
 
e04a879
58a043d
e04a879
58a043d
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
---
language:
  - en
license: apache-2.0
library_name: peft
tags:
  - peft
  - lora
  - qwen2.5
  - compliance
  - fdcpa
  - text-classification
base_model: Qwen/Qwen2.5-3B-Instruct
pipeline_tag: text-classification
---

# FDCPA Rule Classifier — QLoRA Adapter

## Model Description

This is a QLoRA adapter fine-tuned on [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) for classifying debt collection call transcripts against 12 FDCPA (Fair Debt Collection Practices Act) compliance rules.

The model determines whether an agent **complied** ("pass") or **violated** ("fail") a specific FDCPA rule given a transcript chunk.

## Intended Use

- **Intended:** Research on fine-tuning small models for domain-specific compliance classification. Demonstrating cost/accuracy tradeoffs between fine-tuned models and API-based LLMs.
- **Out-of-scope:** Production compliance decisions. Legal advice. Processing real consumer data without proper authorization.

## How to Use

```python
import json
import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model = "Qwen/Qwen2.5-3B-Instruct"
adapter = "ree2raz/fdcpa-rule-classifier-qlora"

tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    base_model, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True
)
model = PeftModel.from_pretrained(model, adapter)
model.eval()

system_msg = "You are an FDCPA compliance evaluator. Given a debt collection rule and a transcript chunk from a collections call, determine whether the agent complied with the rule. Respond with a JSON object containing your verdict and reasoning."

rule_description = """## Rule
- **Rule ID:** FDCPA-001
- **Rule Name:** Mini-Miranda Disclosure
- **Description:** The debt collector must identify themselves as a debt collector and state that any information obtained will be used for that purpose within the first communication."""

transcript = "Agent: Hi, this is Alex from ABC Collections. I need to discuss your account."

user_msg = f"""{rule_description}

## Transcript Chunk
{transcript}

## Task
Determine whether the agent COMPLIED with the above rule in this transcript chunk. Respond with ONLY a JSON object:
{{"verdict": "pass" or "fail", "reasoning": "1-2 sentence explanation"}}"""

messages = [
    {"role": "system", "content": system_msg},
    {"role": "user", "content": user_msg},
]

input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=150, temperature=0.0, do_sample=False)

generated = outputs[0][inputs["input_ids"].shape[1]:]
result = json.loads(tokenizer.decode(generated, skip_special_tokens=True))
print(result)
```

## Training Data

- **Source:** ~300+ synthetic transcript chunks generated by GPT-4.1-mini
- **Rules:** 12 FDCPA compliance rules
- **Distribution:** ~28 examples per rule (15 easy, 8 medium, 5 hard difficulty)
- **Seeds:** 24 hand-curated examples anchoring the generation process
- **Split:** 80% train / 10% validation / 10% test (stratified by rule and verdict)
- **Test set:** Hand-reviewed for label accuracy

## Training Procedure

### Hyperparameters

| Parameter | Value |
|-----------|-------|
| Base model | Qwen/Qwen2.5-3B-Instruct |
| Quantization | NF4 (4-bit), double quant |
| LoRA rank | 16 |
| LoRA alpha | 32 |
| LoRA dropout | 0.05 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Epochs | 3 |
| Batch size | 4 (effective 16 with grad accumulation) |
| Learning rate | 2e-4 |
| Schedule | Cosine with 10% warmup |
| Max sequence length | 2048 |
| Precision | bf16 mixed |

### Training Infrastructure

- Kaggle T4 GPU (16GB VRAM)
- ~60-90 minutes training time
- Peak VRAM: ~12-13 GB

## Evaluation Results

Evaluated on 39 hand-reviewed test examples across 12 FDCPA rules (23 pass / 16 fail).

| Model | Accuracy | F1 (macro) | Parse Rate |
|-------|----------|------------|------------|
| o3-mini (ceiling) | **100.0%** | **1.000** | 46.2% |
| Qwen Base (zero-shot) | 76.9% | 0.769 | **100.0%** |
| **Qwen QLoRA (this model)** | **84.6%** | **0.846** | **100.0%** |

The fine-tuned model closes ~32% of the gap between the base model (76.9%) and the API ceiling (100%). All 6 errors are false negatives — the model over-predicts violations on transcripts that contain surface-level non-compliance signals (incomplete disclosure, pending verification) despite the agent ultimately complying.

## Limitations and Biases

1. Trained entirely on synthetic data — distribution shift from real transcripts is unknown
2. Small training set (~300 examples) limits generalization
3. Only tested on Qwen2.5-3B architecture — results may not transfer
4. Binary pass/fail classification oversimplifies nuanced compliance questions
5. Subjective rules (harassment, tone) have inherently lower agreement
6. The model inherits biases present in the teacher model (GPT-4.1-mini)

## Citation

```bibtex
@misc{fdcpa-rule-classifier,
  title={FDCPA Rule Classifier: QLoRA Fine-tuning for Compliance Classification},
  author={Rituraj},
  year={2025},
  url={https://github.com/ree2raz/fdcpa-rule-classifier}
}
```

## Links

- [GitHub Repository](https://github.com/ree2raz/fdcpa-rule-classifier)
- [Scrutiny — Upstream Compliance System](https://github.com/ree2raz/scrutiny)
- [Base Model](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)