bert-ag-news-classifier

This model is a fine-tuned version of google-bert/bert-base-uncased on the fancyzhx/ag_news dataset.

It achieves the following results on the evaluation set:

Loss: 0.2339
Accuracy: 0.9461
Precision Macro: 0.9461
Recall Macro: 0.9461
F1 Macro: 0.9461

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

Source dataset: fancyzhx/ag_news.

AG News is an English news topic classification dataset with four labels:

0: World
1: Sports
2: Business
3: Sci/Tech

The original dataset provides an official training split and an official test split.

Data split used in this project:

Split	Source	Size	Purpose
Train	90% of official training split	108,000	Model fine-tuning
Validation	10% of official training split	12,000	Checkpoint selection
Test	Official test split	7,600	Final evaluation

The train/validation split was stratified by label, so each class remains balanced:

Split	World	Sports	Business	Sci/Tech
Train	27,000	27,000	27,000	27,000
Validation	3,000	3,000	3,000	3,000
Test	1,900	1,900	1,900	1,900

Text preprocessing was intentionally light:

Leading and trailing whitespace was removed.
Repeated whitespace was collapsed into a single space.
Punctuation was kept.
No manual lowercasing was applied beyond the behavior of google-bert/bert-base-uncased.

The official test split was not used during training or checkpoint selection. The best checkpoint was selected using validation macro F1.

Final evaluation on the official test split:

Metric	Value
Accuracy	0.9461
Macro precision	0.9461
Macro recall	0.9461
Macro F1	0.9461

Per-class test performance:

Class	Precision	Recall	F1	Support
World	0.9603	0.9547	0.9575	1,900
Sports	0.9884	0.9879	0.9882	1,900
Business	0.9203	0.9116	0.9159	1,900
Sci/Tech	0.9155	0.9300	0.9227	1,900

The confusion matrix and error samples are included in this repository:

confusion_matrix.csv
error_analysis.csv

The main confusion patterns are between Business and Sci/Tech, which is expected because technology-company news, product launches, and market-related technology stories often overlap.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 16
eval_batch_size: 32
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 0.1
num_epochs: 3.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy	Precision Macro	Recall Macro	F1 Macro
0.1963	1.0	6750	0.1911	0.9413	0.9417	0.9412	0.9414
0.1206	2.0	13500	0.2082	0.9451	0.9460	0.9451	0.9451
0.1125	3.0	20250	0.2336	0.9453	0.9456	0.9453	0.9454

Framework versions

Transformers 5.6.2
Pytorch 2.11.0+cu130
Datasets 4.8.4
Tokenizers 0.22.2

Downloads last month: 150

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for kyLELEng/bert-ag-news-classifier

Base model

google-bert/bert-base-uncased

Finetuned

(6696)

this model