Patent Green Technology Classifier (PatentSBERTa Fine-tuned + MAS)
A binary text classifier for detecting green/sustainable technology patent claims, built on top of AI-Growth-Lab/PatentSBERTa. This is the Assignment 3 model, extending Assignment 2 by replacing the simple LLM labeling step with a three-agent debate system (MAS).
Model Description
This model was fine-tuned on a balanced dataset of 35,100 patent claims with gold-enhanced labels derived from a Multi-Agent System (MAS) debate pipeline followed by a Human-in-the-Loop (HITL) review. It classifies patent claims as either green technology (1) or not green technology (0).
Training Data
- Base dataset: AI-Growth-Lab/patents_claims_1.5m_traim_test
- Silver labels: Derived from CPC Y02* codes (25,000 green + 25,000 not green)
- Gold labels: 100 examples labeled via MAS debate → Human HITL workflow
- Dataset: alexchrander/patents-green-mas-dataset
Training Procedure
Active Learning + MAS + HITL workflow:
- Reused frozen PatentSBERTa baseline and uncertainty scores from Assignment 2
- Selected the same 100 most uncertain examples via uncertainty sampling
- Used a three-agent debate system to suggest labels:
- Advocate (Mistral-7B-Instruct-v0.2) — argues FOR green classification
- Skeptic (Qwen2.5-7B-Instruct) — argues AGAINST green classification
- Judge (Meta-Llama-3-8B-Instruct) — weighs both arguments and produces final label
- Human reviewer assigned final gold labels based on the full debate
- Fine-tuned PatentSBERTa on the gold-enhanced dataset
Hyperparameters:
- max_seq_length: 256
- epochs: 1
- learning_rate: 2e-5
- batch_size: 16
Results
Comparison across all model versions
| Model Version | Training Data Source | F1 | Accuracy |
|---|---|---|---|
| Baseline (frozen) | Frozen Embeddings (No Fine-tuning) | 0.77 | 0.77 |
| Assignment 2 Model | Fine-tuned on Silver + Gold (Simple LLM) | 0.81 | 0.81 |
| Assignment 3 Model (this model) | Fine-tuned on Silver + Gold (MAS) | 0.81 | 0.81 |
MAS vs Simple LLM label quality
| Not Green | Green | Low Confidence | |
|---|---|---|---|
| Assignment 2 (Mistral) | 95 | 5 | 72% |
| Assignment 3 (MAS) | 51 | 47 | 4% |
The MAS produced significantly more balanced and confident labels than the simple LLM approach, though both models achieved the same downstream F1 score of 0.81.
Video
https://panopto.aau.dk/Panopto/Pages/Viewer.aspx?id=5283748b-c71c-473c-89ec-b3f9016361f4
- Downloads last month
- 3
Model tree for alexchrander/patent-sberta-green-finetuned-mas
Base model
AI-Growth-Lab/PatentSBERTa