ModernBERT-large — Consumer Finance Complaint Classification (11 labels)

Fine-tuned ModernBERT-large (~395M params) for multi-class classification of US consumer finance complaints into 11 consolidated categories.

Context

This model was developed as part of Project 12 — "Compare AI Algorithms: Machine Learning vs. LLM" of the OpenClassrooms AI Developer certification by William Derue.

The project scenario involves ZenAssist, a customer support platform serving 200+ companies. The goal is to automatically label incoming consumer complaints to route them to the correct support department, comparing traditional ML approaches with LLM-based inference.

This fine-tuned encoder model represents the supervised ML approach — a single forward pass through ModernBERT produces classification logits, making it fast, cheap to run, and suitable for high-throughput production deployment.

Labels

The original dataset contains 18 raw product tags, consolidated into 11 categories to reduce semantic overlap and improve model performance:

#	Label	Description
0	`Bank account`	Bank account or service, Checking or savings account
1	`Consumer Loan`	Consumer Loan
2	`Credit card`	Credit card, Credit card or prepaid card, Prepaid card
3	`Credit reporting`	Credit reporting, credit repair services, or other personal consumer reports
4	`Debt collection`	Debt collection
5	`Money transfer`	Money transfer, virtual currency, or money service
6	`Mortgage`	Mortgage
7	`Other financial service`	Other financial service, Virtual currency
8	`Payday loan`	Payday loan, title loan, or personal loan
9	`Student loan`	Student loan
10	`Vehicle loan or lease`	Vehicle loan or lease

Usage

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="WillisBack/modernbert-large-consumer-finance-11cls",
    truncation=True,
    max_length=1024,
)

result = classifier("I was charged an overdraft fee on my checking account without prior notice.")
print(result)
# [{'label': 'Bank account', 'score': 0.95}]

Training

Parameter	Value
Base model	`unsloth/ModernBERT-large`
Architecture	`ModernBertForSequenceClassification`
Parameters	~395M (full finetuning, no LoRA)
Train samples	293,698
Eval samples	5,000
Labels	11
Max epochs	2
Effective epochs	1.6 (early stopping, patience=3 on F1 macro)
Batch size (effective)	64 (16 × grad_accum 4)
Learning rate	2e-5
Warmup steps	200
Scheduler	Linear decay
Optimizer	AdamW
Weight decay	0.01
Max sequence length	1024
Precision	bf16
Loss	CrossEntropyLoss (class-weighted)
torch.compile	Enabled
GPU	NVIDIA GeForce RTX 5080 (16 GB GDDR7)
Training time	~~380 min (~~6.3h)
Peak VRAM	4.58 GB (29.6%)

Class weights

Computed with sklearn.utils.class_weight.compute_class_weight("balanced") to handle class imbalance:

Label	Weight
Credit reporting	0.303
Debt collection	0.396
Mortgage	0.630
Credit card	0.804
Bank account	1.204
Student loan	1.532
Consumer Loan	3.535
Money transfer	4.800
Payday loan	5.422
Vehicle loan or lease	5.835
Other financial service	109.425

Results

Metric	Value
F1 macro	0.6126
Accuracy	78.2%
Weighted F1	0.79
Train loss	0.9178

Per-class performance

Label	Precision	Recall	F1	Support
Mortgage	0.90	0.91	0.90	726
Student loan	0.74	0.93	0.83	277
Credit reporting	0.89	0.76	0.82	1,509
Debt collection	0.82	0.77	0.79	1,119
Bank account	0.73	0.79	0.76	392
Credit card	0.75	0.78	0.76	602
Money transfer	0.65	0.78	0.71	95
Payday loan	0.36	0.62	0.46	80
Vehicle loan or lease	0.29	0.50	0.37	74
Consumer Loan	0.29	0.39	0.33	122
Other financial service	0.00	0.00	0.00	4

Observations

Strong classes (F1 ≥ 0.70): Mortgage, Student loan, Credit reporting, Debt collection, Bank account, Credit card, Money transfer — these cover ~94% of the evaluation set.
Weak classes: Consumer Loan, Payday loan, Vehicle loan or lease suffer from semantic overlap (all are loan products) and low sample counts.
Other financial service (4 eval samples, 244 train samples) remains unlearnable at this scale. Consider merging with the nearest class or removing for production.
Early stopping triggered at epoch 1.6 — the model converged before completing 2 full epochs.

Dataset

Trained on WillisBack/dataset-financial-user-claim — a cleaned, deduplicated, and label-consolidated version of the US CFPB Consumer Complaints dataset.

Files

File	Description
`model.safetensors`	Model weights (~792 MB)
`config.json`	Model architecture config with id2label/label2id
`tokenizer.json`	Tokenizer vocabulary
`tokenizer_config.json`	Tokenizer settings
`label_config.json`	Label list, id2label, label2id, model name, max_seq_length

Limitations

Trained on English-language US CFPB complaints only. Performance on other languages or domains is unknown.
Tail classes (Other financial service, Consumer Loan, Vehicle loan or lease) have low F1 — predictions on these should be treated with lower confidence.
Max input length is 1024 tokens. Longer complaints are truncated.

Citation

@misc{derue2026modernbert-finance,
  author = {Derue, William},
  title = {ModernBERT-large Fine-tuned for Consumer Finance Complaint Classification},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/WillisBack/modernbert-large-consumer-finance-11cls}
}

Downloads last month: 21

Safetensors

Model size

0.4B params

Tensor type

BF16

Model tree for WillisBack/modernbert-large-consumer-finance-11cls

Base model

answerdotai/ModernBERT-large

Finetuned

(262)

this model

Dataset used to train WillisBack/modernbert-large-consumer-finance-11cls

Evaluation results

F1 Macro on Consumer Finance Complaints (11 labels)
self-reported

0.613
Accuracy on Consumer Finance Complaints (11 labels)
self-reported

0.782