--- library_name: transformers license: apache-2.0 base_model: google-bert/bert-base-uncased tags: - generated_from_trainer metrics: - accuracy model-index: - name: bert-ag-news-classifier results: [] --- # bert-ag-news-classifier This model is a fine-tuned version of [`google-bert/bert-base-uncased`](https://huggingface.co/google-bert/bert-base-uncased) on the [`fancyzhx/ag_news`](https://huggingface.co/datasets/fancyzhx/ag_news) dataset. It achieves the following results on the evaluation set: - Loss: 0.2339 - Accuracy: 0.9461 - Precision Macro: 0.9461 - Recall Macro: 0.9461 - F1 Macro: 0.9461 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data Source dataset: `fancyzhx/ag_news`. AG News is an English news topic classification dataset with four labels: - `0`: World - `1`: Sports - `2`: Business - `3`: Sci/Tech The original dataset provides an official training split and an official test split. Data split used in this project: | Split | Source | Size | Purpose | |---|---:|---:|---| | Train | 90% of official training split | 108,000 | Model fine-tuning | | Validation | 10% of official training split | 12,000 | Checkpoint selection | | Test | Official test split | 7,600 | Final evaluation | The train/validation split was stratified by label, so each class remains balanced: | Split | World | Sports | Business | Sci/Tech | |---|---:|---:|---:|---:| | Train | 27,000 | 27,000 | 27,000 | 27,000 | | Validation | 3,000 | 3,000 | 3,000 | 3,000 | | Test | 1,900 | 1,900 | 1,900 | 1,900 | Text preprocessing was intentionally light: - Leading and trailing whitespace was removed. - Repeated whitespace was collapsed into a single space. - Punctuation was kept. - No manual lowercasing was applied beyond the behavior of `google-bert/bert-base-uncased`. The official test split was not used during training or checkpoint selection. The best checkpoint was selected using validation macro F1. Final evaluation on the official test split: | Metric | Value | |---|---:| | Accuracy | 0.9461 | | Macro precision | 0.9461 | | Macro recall | 0.9461 | | Macro F1 | 0.9461 | Per-class test performance: | Class | Precision | Recall | F1 | Support | |---|---:|---:|---:|---:| | World | 0.9603 | 0.9547 | 0.9575 | 1,900 | | Sports | 0.9884 | 0.9879 | 0.9882 | 1,900 | | Business | 0.9203 | 0.9116 | 0.9159 | 1,900 | | Sci/Tech | 0.9155 | 0.9300 | 0.9227 | 1,900 | The confusion matrix and error samples are included in this repository: - `confusion_matrix.csv` - `error_analysis.csv` The main confusion patterns are between `Business` and `Sci/Tech`, which is expected because technology-company news, product launches, and market-related technology stories often overlap. ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 16 - eval_batch_size: 32 - seed: 42 - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 0.1 - num_epochs: 3.0 ### Training results | Training Loss | Epoch | Step | Validation Loss | Accuracy | Precision Macro | Recall Macro | F1 Macro | |:-------------:|:-----:|:-----:|:---------------:|:--------:|:---------------:|:------------:|:--------:| | 0.1963 | 1.0 | 6750 | 0.1911 | 0.9413 | 0.9417 | 0.9412 | 0.9414 | | 0.1206 | 2.0 | 13500 | 0.2082 | 0.9451 | 0.9460 | 0.9451 | 0.9451 | | 0.1125 | 3.0 | 20250 | 0.2336 | 0.9453 | 0.9456 | 0.9453 | 0.9454 | ### Framework versions - Transformers 5.6.2 - Pytorch 2.11.0+cu130 - Datasets 4.8.4 - Tokenizers 0.22.2