Upload README.md with huggingface_hub

58a043d verified 18 days ago

5.62 kB

	---
	language:
	- en
	license: apache-2.0
	library_name: peft
	tags:
	- peft
	- lora
	- qwen2.5
	- compliance
	- fdcpa
	- text-classification
	base_model: Qwen/Qwen2.5-3B-Instruct
	pipeline_tag: text-classification
	---

	# FDCPA Rule Classifier — QLoRA Adapter

	## Model Description

	This is a QLoRA adapter fine-tuned on [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) for classifying debt collection call transcripts against 12 FDCPA (Fair Debt Collection Practices Act) compliance rules.

	The model determines whether an agent complied ("pass") or violated ("fail") a specific FDCPA rule given a transcript chunk.

	## Intended Use

	- Intended: Research on fine-tuning small models for domain-specific compliance classification. Demonstrating cost/accuracy tradeoffs between fine-tuned models and API-based LLMs.
	- Out-of-scope: Production compliance decisions. Legal advice. Processing real consumer data without proper authorization.

	## How to Use

	```python
	import json
	import torch
	from peft import PeftModel
	from transformers import AutoModelForCausalLM, AutoTokenizer

	base_model = "Qwen/Qwen2.5-3B-Instruct"
	adapter = "ree2raz/fdcpa-rule-classifier-qlora"

	tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained(
	base_model, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True
	)
	model = PeftModel.from_pretrained(model, adapter)
	model.eval()

	system_msg = "You are an FDCPA compliance evaluator. Given a debt collection rule and a transcript chunk from a collections call, determine whether the agent complied with the rule. Respond with a JSON object containing your verdict and reasoning."

	rule_description = """## Rule
	- Rule ID: FDCPA-001
	- Rule Name: Mini-Miranda Disclosure
	- Description: The debt collector must identify themselves as a debt collector and state that any information obtained will be used for that purpose within the first communication."""

	transcript = "Agent: Hi, this is Alex from ABC Collections. I need to discuss your account."

	user_msg = f"""{rule_description}

	## Transcript Chunk
	{transcript}

	## Task
	Determine whether the agent COMPLIED with the above rule in this transcript chunk. Respond with ONLY a JSON object:
	{{"verdict": "pass" or "fail", "reasoning": "1-2 sentence explanation"}}"""

	messages = [
	{"role": "system", "content": system_msg},
	{"role": "user", "content": user_msg},
	]

	input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	inputs = tokenizer(input_text, return_tensors="pt").to(model.device)

	with torch.no_grad():
	outputs = model.generate(**inputs, max_new_tokens=150, temperature=0.0, do_sample=False)

	generated = outputs[0][inputs["input_ids"].shape[1]:]
	result = json.loads(tokenizer.decode(generated, skip_special_tokens=True))
	print(result)
	```

	## Training Data

	- Source: ~300+ synthetic transcript chunks generated by GPT-4.1-mini
	- Rules: 12 FDCPA compliance rules
	- Distribution: ~28 examples per rule (15 easy, 8 medium, 5 hard difficulty)
	- Seeds: 24 hand-curated examples anchoring the generation process
	- Split: 80% train / 10% validation / 10% test (stratified by rule and verdict)
	- Test set: Hand-reviewed for label accuracy

	## Training Procedure

	### Hyperparameters

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| Base model \| Qwen/Qwen2.5-3B-Instruct \|
	\| Quantization \| NF4 (4-bit), double quant \|
	\| LoRA rank \| 16 \|
	\| LoRA alpha \| 32 \|
	\| LoRA dropout \| 0.05 \|
	\| Target modules \| q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj \|
	\| Epochs \| 3 \|
	\| Batch size \| 4 (effective 16 with grad accumulation) \|
	\| Learning rate \| 2e-4 \|
	\| Schedule \| Cosine with 10% warmup \|
	\| Max sequence length \| 2048 \|
	\| Precision \| bf16 mixed \|

	### Training Infrastructure

	- Kaggle T4 GPU (16GB VRAM)
	- ~60-90 minutes training time
	- Peak VRAM: ~12-13 GB

	## Evaluation Results

	Evaluated on 39 hand-reviewed test examples across 12 FDCPA rules (23 pass / 16 fail).

	\| Model \| Accuracy \| F1 (macro) \| Parse Rate \|
	\|-------\|----------\|------------\|------------\|
	\| o3-mini (ceiling) \| 100.0% \| 1.000 \| 46.2% \|
	\| Qwen Base (zero-shot) \| 76.9% \| 0.769 \| 100.0% \|
	\| Qwen QLoRA (this model) \| 84.6% \| 0.846 \| 100.0% \|

	The fine-tuned model closes ~32% of the gap between the base model (76.9%) and the API ceiling (100%). All 6 errors are false negatives — the model over-predicts violations on transcripts that contain surface-level non-compliance signals (incomplete disclosure, pending verification) despite the agent ultimately complying.

	## Limitations and Biases

	1. Trained entirely on synthetic data — distribution shift from real transcripts is unknown
	2. Small training set (~300 examples) limits generalization
	3. Only tested on Qwen2.5-3B architecture — results may not transfer
	4. Binary pass/fail classification oversimplifies nuanced compliance questions
	5. Subjective rules (harassment, tone) have inherently lower agreement
	6. The model inherits biases present in the teacher model (GPT-4.1-mini)

	## Citation

	```bibtex
	@misc{fdcpa-rule-classifier,
	title={FDCPA Rule Classifier: QLoRA Fine-tuning for Compliance Classification},
	author={Rituraj},
	year={2025},
	url={https://github.com/ree2raz/fdcpa-rule-classifier}
	}
	```

	## Links

	- [GitHub Repository](https://github.com/ree2raz/fdcpa-rule-classifier)
	- [Scrutiny — Upstream Compliance System](https://github.com/ree2raz/scrutiny)
	- [Base Model](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)