Fine-tuned ModernBERT for Policy Agendas Classification in English
Overview
This is a fine-tuned ModernBERT trained using Modern-BERT-large (28 layers, 395M) pre-trained on 2 trillion tokens of English and code data with native context up to 8,192 tokens. The dataset for training this model consists of 6,169 Acts of the UK Parliament between 1911 and 2014, annotated using the major policy agenda topics from the Comparative Agendas Project.
Labels
- 1. Macroeconomics. Includes issues related to general domestic macroeconomic policy.
- 2. Civil Rights. Includes issues related generally to civil rights and minority rights.
- 3. Health. Includes issues related generally to health care, including appropriations for general health care government agencies.
- 4. Agriculture. Includes issues related to general agriculture policy, including appropriations for general agriculture government agencies.
- 5. Labor. Includes issues generally related to labor, employment, and pensions, including appropriations for government agencies regulating labor policy.
- 6. Education. Includes issues related to general education policy, including appropriations for government agencies regulating education policy.
- 7. Environment. Includes issues related to general environmental policy, including appropriations for government agencies regulating environmental policy.
- 8. Energy. Includes issues generally related to energy policy, including appropriations for government agencies regulating energy policy.
- 9. Immigration. Includes issues related to immigration, refugees, and citizenship.
- 10. Transportation. Includes issues related generally to transportation, including appropriations for government agencies regulating transportation policy.
- 12. Law and Crime. Includes issues related to general law, crime, and family issues.
- 13. Social Welfare. Includes issues generally related to social welfare policy.
- 14. Housing. Includes issues related generally to housing and urban affairs.
- 15. Domestic Commerce. Includes issues generally related to domestic commerce, including appropriations for government agencies regulating domestic commerce.
- 16. Defense. Includes issues related generally to defense policy, and appropriations for agencies that oversee general defense policy.
- 17. Technology. Includes issues related to general space, science, technology, and communications.
- 18. Foreign Trade. Includes issues generally related to foreign trade and appropriations for government agencies generally regulating foreign trade.
- 19. International Affairs. Includes issues related to general international affairs and foreign aid, including appropriations for general government foreign affairs agencies.
- 20. Government Operations. Includes issues related to general government operations, including appropriations for multiple government agencies.
- 21. Public Lands. Includes issues related to general public lands, water management, and territorial issues.
- 23. Culture. Includes issues related to general cultural policy issues.
Note. Topics 11 and 22 do not exist.
Data Splits
The data was split in a proportion of 70/15/15 for training, validation and testing. Data imbalance was corrected by stratifying major agenda topics during the split process.
- Train (n = 4,318), stratified by major policy topic.
- Validation (n = 925), used for validation metrics.
- Test (n = 926), Held-out, reported in forthcoming paper.
Example Usage
## Pipeline as a high-level helper
from transformers import pipeline
agendas_classifier = pipeline("text-classification", model="bgonzalezbustamante/ft-ModernBERT-policy-agenda-English")
## Act example
act_example = agendas_classifier("Statutory Gas Companies (Electricity Supply Powers) Act, 1925 c. 44. An Act to facilitate the supply of electricity by statutory gas companies")
## Print example
print(act_example)
Output:
[{'label': '8', 'score': 0.9935185313224792}]
Note. This example was extracted from the held-out test set.
Validation Metrics
- Accuracy: 0.835
- Precision macro: 0.794
- Precision micro: 0.835
- Precision weighted: 0.834
- Recall macro: 0.746
- Recall micro: 0.835
- Recall weighted: 0.835
- F1 macro: 0.760
- F1 micro: 0.835
- F1 weighted: 0.831
Intended Uses and Limitations
This model includes legislative debates policy classification in English, including machine translations when the BLEU scores are acceptable. The transfer of the domain must be implemented carefully.
Training Configuration
- Learning rate: 5e-05
- Epochs: 10
- Batch size: 8
- Warmup ratio: 0.1
- Gradient accumulation: 1
- Weight decay: 0.0
Environmental Impact
- 409g CO₂ eq emissions
How to Cite
González-Bustamante, B. (2025). ft-ModernBERT-policy-agenda-English (Revision 6f2023c). Hugging Face. https://doi.org/10.57967/hf/6864.
- Downloads last month
- 3
Model tree for bgonzalezbustamante/ft-ModernBERT-policy-agenda-English
Base model
answerdotai/ModernBERT-large