Model Card: PatentSBERTa Fine-Tuned on Green Patent Claims (Assignment 3)

Model Summary

This model is a fine-tuned version of AI-Growth-Lab/PatentSBERTa for binary classification of patent claims as green technology (Y02) or not. It was developed as part of Assignment 3 in the Applied Deep Learning and AI course at Aalborg University. Compared to Assignment 2, this model uses a more advanced Multi-Agent System (MAS) to generate higher-quality gold labels for the 100 high-risk claims before fine-tuning PatentSBERTa.


Model Details

  • Developed by: Anders Sønderbý (as58zr@student.aau.dk)
  • Model type: Sentence Transformer with classification head (binary)
  • Base model: AI-Growth-Lab/PatentSBERTa
  • Language: English
  • License: MIT
  • Task: Binary text classification — Green Technology (Y02) vs. Not Green

What This Model Does

Given the text of a patent claim, the model predicts whether the claim relates to green technology as defined by the CPC Y02 classification system. The output is a binary label:

  • 1 — Green technology (Y02)
  • 0 — Not green technology

Key Difference from Assignment 2

In Assignment 2, a single generic LLM was used to suggest labels for the 100 high-risk claims before human review. In Assignment 3, a Multi-Agent System (MAS) using CrewAI was used instead, where three specialised agents debated each claim before producing a final label. The hypothesis is that adversarial debate between agents produces higher-quality gold labels, which in turn produces a better fine-tuned PatentSBERTa model.


Training Pipeline Overview

Stage 1 & 2 — Setup (Same as Assignment 2)

The same patents_50k_green.parquet balanced 50k dataset was used. Uncertainty scores were recomputed from the Assignment 2 baseline model and the same top 100 high-risk claims (hitl_green_100.csv) were selected for labeling.

Stage 3 — Multi-Agent Labeling (CrewAI)

Three agents debated each of the 100 high-risk patent claims:

Agent Role Objective
Advocate Green Patent Expert Argue why the claim qualifies as Y02 green technology
Skeptic Greenwashing Analyst Challenge the Y02 classification and identify greenwashing
Judge Senior Patent Examiner Weigh both arguments and produce a final JSON label + rationale

The Judge output for each claim: {"label": 0 or 1, "rationale": "2-3 sentence explanation"}

The LLM used for all three agents was groq/meta-llama/llama-4-scout-17b-16e-instruct via the Groq API.

Stage 4 — Human Review (HITL)

A human reviewer assessed all 100 claims using the agent arguments and Judge rationale as context. The final gold label (is_green_gold) reflects the human decision, with the AI rationale available as supporting context.

Stage 5 — Fine-Tuning PatentSBERTa

PatentSBERTa was fine-tuned for binary classification using the combined train_silver + gold_100 dataset, where gold labels override silver labels for the 100 HITL-reviewed claims.


Training Data

  • Dataset: Derived from AI-Growth-Lab/patents_claims_1.5m_traim_test
  • Working file: patents_50k_green.parquet — a balanced 50k sample (25,000 green, 25,000 not green)
  • Silver label source: CPC Y02* classification codes (is_green_silver)
  • Gold labels: 100 human-reviewed claims labeled via MAS debate (is_green_gold)

Dataset Splits

Split Size Description
train_silver ~40,000 Silver-labeled training set (CPC-derived)
eval_silver ~5,000 Silver-labeled evaluation set
pool_unlabeled ~5,000 Unlabeled pool used for uncertainty sampling
gold_100 100 Human-reviewed high-uncertainty claims (MAS-assisted)

Training Hyperparameters

Parameter Value
Base model AI-Growth-Lab/PatentSBERTa
Max sequence length 256
Epochs 1
Learning rate 2e-5
Training set size ~40,100 (train_silver + gold_100)

Evaluation Results

Evaluation Set F1 Score Notes
eval_silver (5,000) 0.824 Primary evaluation metric
gold_100 (100) 0.667 Human-reviewed high-uncertainty claims

Comparative Analysis

Model Version Training Data Source F1 Score
1. Baseline Frozen Embeddings (No Fine-tuning) 0.780
2. Assignment 2 Model Fine-tuned on Silver + Gold (Simple LLM) 0.818
3. Assignment 3 Model Fine-tuned on Silver + Gold (MAS - CrewAI) 0.824

The MAS approach produced a modest improvement in F1 score (+0.006) over the simple LLM approach from Assignment 2. While the improvement is small, the adversarial debate structure between Advocate and Skeptic agents likely produced more nuanced and reliable gold labels for the high-risk claims, particularly for borderline cases where a single LLM might have been overconfident. The added engineering complexity of the MAS is partially justified by the quality improvement, though the marginal gain suggests that the bottleneck may lie in the size of the gold label set (100 claims) rather than label quality alone.


HITL Agreement Reporting

Human-AI agreement was tracked for both Assignment 2 and Assignment 3:

Assignment Labeling Method Human-AI Agreement
Assignment 2 Simple generic LLM [97 %]
Assignment 3 Multi-Agent System (MAS) [94 %]

Intended Use

  • Primary use: Academic research and coursework in patent classification
  • Intended users: Course instructors and students at Aalborg University
  • Out-of-scope: Production patent classification systems, legal patent assessment, or any commercial use

Limitations

  • Trained on a balanced 50k sample — performance may differ on the full unbalanced patent corpus
  • Silver labels are derived from CPC codes, which may contain noise
  • Gold labels are based on 100 claims only — a larger gold set would likely improve downstream performance more significantly
  • The MAS agents occasionally showed bias, with the Advocate tending to over-generalise green characteristics and the Judge sometimes deferring too strongly to one agent's argument

Repository

The full code, notebooks, and data files for this assignment are available in the course GitHub repository.

Downloads last month
2
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Anders-sonderby/patentsbert-finetune_1

Finetuned
(20)
this model