Best jy commited on
Commit
09aa324
·
verified ·
1 Parent(s): ae2279b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +62 -2
README.md CHANGED
@@ -16,7 +16,7 @@ should probably proofread and complete it, then remove this comment. -->
16
 
17
  # bert-ag-news-classifier
18
 
19
- This model is a fine-tuned version of [google-bert/bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased) on an unknown dataset.
20
  It achieves the following results on the evaluation set:
21
  - Loss: 0.2339
22
  - Accuracy: 0.9461
@@ -34,7 +34,67 @@ More information needed
34
 
35
  ## Training and evaluation data
36
 
37
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
 
39
  ## Training procedure
40
 
 
16
 
17
  # bert-ag-news-classifier
18
 
19
+ This model is a fine-tuned version of `google-bert/bert-base-uncased` on the [`fancyzhx/ag_news`](https://huggingface.co/datasets/fancyzhx/ag_news) dataset.
20
  It achieves the following results on the evaluation set:
21
  - Loss: 0.2339
22
  - Accuracy: 0.9461
 
34
 
35
  ## Training and evaluation data
36
 
37
+ Source dataset: `fancyzhx/ag_news`.
38
+
39
+ AG News is an English news topic classification dataset with four labels:
40
+
41
+ - `0`: World
42
+ - `1`: Sports
43
+ - `2`: Business
44
+ - `3`: Sci/Tech
45
+
46
+ The original dataset provides an official training split and an official test split.
47
+
48
+ Data split used in this project:
49
+
50
+ | Split | Source | Size | Purpose |
51
+ |---|---:|---:|---|
52
+ | Train | 90% of official training split | 108,000 | Model fine-tuning |
53
+ | Validation | 10% of official training split | 12,000 | Checkpoint selection |
54
+ | Test | Official test split | 7,600 | Final evaluation |
55
+
56
+ The train/validation split was stratified by label, so each class remains balanced:
57
+
58
+ | Split | World | Sports | Business | Sci/Tech |
59
+ |---|---:|---:|---:|---:|
60
+ | Train | 27,000 | 27,000 | 27,000 | 27,000 |
61
+ | Validation | 3,000 | 3,000 | 3,000 | 3,000 |
62
+ | Test | 1,900 | 1,900 | 1,900 | 1,900 |
63
+
64
+ Text preprocessing was intentionally light:
65
+
66
+ - Leading and trailing whitespace was removed.
67
+ - Repeated whitespace was collapsed into a single space.
68
+ - Punctuation was kept.
69
+ - No manual lowercasing was applied beyond the behavior of `google-bert/bert-base-uncased`.
70
+
71
+ The official test split was not used during training or checkpoint selection. The best checkpoint was selected using validation macro F1.
72
+
73
+ Final evaluation on the official test split:
74
+
75
+ | Metric | Value |
76
+ |---|---:|
77
+ | Accuracy | 0.9461 |
78
+ | Macro precision | 0.9461 |
79
+ | Macro recall | 0.9461 |
80
+ | Macro F1 | 0.9461 |
81
+
82
+ Per-class test performance:
83
+
84
+ | Class | Precision | Recall | F1 | Support |
85
+ |---|---:|---:|---:|---:|
86
+ | World | 0.9603 | 0.9547 | 0.9575 | 1,900 |
87
+ | Sports | 0.9884 | 0.9879 | 0.9882 | 1,900 |
88
+ | Business | 0.9203 | 0.9116 | 0.9159 | 1,900 |
89
+ | Sci/Tech | 0.9155 | 0.9300 | 0.9227 | 1,900 |
90
+
91
+ The confusion matrix and error samples are included in this repository:
92
+
93
+ - `confusion_matrix.csv`
94
+ - `error_analysis.csv`
95
+
96
+ The main confusion patterns are between `Business` and `Sci/Tech`, which is expected because technology-company news, product launches, and market-related technology stories often overlap.
97
+
98
 
99
  ## Training procedure
100