Model Card: PatentSBERTa Fine-Tuned on Green Patent Claims (Assignment 3)

Model Summary

This model is a fine-tuned version of AI-Growth-Lab/PatentSBERTa for binary classification of patent claims as green technology (Y02) or not. It was developed as part of Assignment 3 in the Applied Deep Learning and AI course at Aalborg University. Compared to Assignment 2, this model uses a more advanced Multi-Agent System (MAS) to generate higher-quality gold labels for the 100 high-risk claims before fine-tuning PatentSBERTa.

Model Details

Developed by: Anders Sønderbý (as58zr@student.aau.dk)
Model type: Sentence Transformer with classification head (binary)
Base model: AI-Growth-Lab/PatentSBERTa
Language: English
License: MIT
Task: Binary text classification — Green Technology (Y02) vs. Not Green

What This Model Does

Given the text of a patent claim, the model predicts whether the claim relates to green technology as defined by the CPC Y02 classification system. The output is a binary label:

1 — Green technology (Y02)
0 — Not green technology

Key Difference from Assignment 2

In Assignment 2, a single generic LLM was used to suggest labels for the 100 high-risk claims before human review. In Assignment 3, a Multi-Agent System (MAS) using CrewAI was used instead, where three specialised agents debated each claim before producing a final label. The hypothesis is that adversarial debate between agents produces higher-quality gold labels, which in turn produces a better fine-tuned PatentSBERTa model.

Training Pipeline Overview

Stage 1 & 2 — Setup (Same as Assignment 2)

The same patents_50k_green.parquet balanced 50k dataset was used. Uncertainty scores were recomputed from the Assignment 2 baseline model and the same top 100 high-risk claims (hitl_green_100.csv) were selected for labeling.

Stage 3 — Multi-Agent Labeling (CrewAI)

Three agents debated each of the 100 high-risk patent claims:

Agent	Role	Objective
Advocate	Green Patent Expert	Argue why the claim qualifies as Y02 green technology
Skeptic	Greenwashing Analyst	Challenge the Y02 classification and identify greenwashing
Judge	Senior Patent Examiner	Weigh both arguments and produce a final JSON label + rationale

The Judge output for each claim: {"label": 0 or 1, "rationale": "2-3 sentence explanation"}

The LLM used for all three agents was groq/meta-llama/llama-4-scout-17b-16e-instruct via the Groq API.

Stage 4 — Human Review (HITL)

A human reviewer assessed all 100 claims using the agent arguments and Judge rationale as context. The final gold label (is_green_gold) reflects the human decision, with the AI rationale available as supporting context.

Stage 5 — Fine-Tuning PatentSBERTa

PatentSBERTa was fine-tuned for binary classification using the combined train_silver + gold_100 dataset, where gold labels override silver labels for the 100 HITL-reviewed claims.

Training Data

Dataset: Derived from AI-Growth-Lab/patents_claims_1.5m_traim_test
Working file: patents_50k_green.parquet — a balanced 50k sample (25,000 green, 25,000 not green)
Silver label source: CPC Y02* classification codes (is_green_silver)
Gold labels: 100 human-reviewed claims labeled via MAS debate (is_green_gold)

Dataset Splits

Split	Size	Description
train_silver	~40,000	Silver-labeled training set (CPC-derived)
eval_silver	~5,000	Silver-labeled evaluation set
pool_unlabeled	~5,000	Unlabeled pool used for uncertainty sampling
gold_100	100	Human-reviewed high-uncertainty claims (MAS-assisted)

Training Hyperparameters

Parameter	Value
Base model	AI-Growth-Lab/PatentSBERTa
Max sequence length	256
Epochs	1
Learning rate	2e-5
Training set size	~40,100 (train_silver + gold_100)

Evaluation Results

Evaluation Set	F1 Score	Notes
eval_silver (5,000)	0.824	Primary evaluation metric
gold_100 (100)	0.667	Human-reviewed high-uncertainty claims

Comparative Analysis

Model Version	Training Data Source	F1 Score
1. Baseline	Frozen Embeddings (No Fine-tuning)	0.780
2. Assignment 2 Model	Fine-tuned on Silver + Gold (Simple LLM)	0.818
3. Assignment 3 Model	Fine-tuned on Silver + Gold (MAS - CrewAI)	0.824

The MAS approach produced a modest improvement in F1 score (+0.006) over the simple LLM approach from Assignment 2. While the improvement is small, the adversarial debate structure between Advocate and Skeptic agents likely produced more nuanced and reliable gold labels for the high-risk claims, particularly for borderline cases where a single LLM might have been overconfident. The added engineering complexity of the MAS is partially justified by the quality improvement, though the marginal gain suggests that the bottleneck may lie in the size of the gold label set (100 claims) rather than label quality alone.

HITL Agreement Reporting

Human-AI agreement was tracked for both Assignment 2 and Assignment 3:

Assignment	Labeling Method	Human-AI Agreement
Assignment 2	Simple generic LLM	[97 %]
Assignment 3	Multi-Agent System (MAS)	[94 %]

Intended Use

Primary use: Academic research and coursework in patent classification
Intended users: Course instructors and students at Aalborg University
Out-of-scope: Production patent classification systems, legal patent assessment, or any commercial use

Limitations

Trained on a balanced 50k sample — performance may differ on the full unbalanced patent corpus
Silver labels are derived from CPC codes, which may contain noise
Gold labels are based on 100 claims only — a larger gold set would likely improve downstream performance more significantly
The MAS agents occasionally showed bias, with the Advocate tending to over-generalise green characteristics and the Judge sometimes deferring too strongly to one agent's argument

Repository

The full code, notebooks, and data files for this assignment are available in the course GitHub repository.

Downloads last month: 2

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for Anders-sonderby/patentsbert-finetune_1

Base model

AI-Growth-Lab/PatentSBERTa

Finetuned

(20)

this model