Model Card: Mistral-7B QLoRA Fine-Tuned on Green Patent Claims (Final Assignment)

Model Summary

This is a QLoRA fine-tuned adapter for mistralai/Mistral-7B-v0.1, adapted for domain-specific classification of patent claims as green technology (Y02) or not. It was developed as part of the Final Assignment in the Applied Deep Learning and AI course at Aalborg University.

The model serves two purposes in the final pipeline:

Domain adaptation — learning the dense linguistic style of patent claims and the logic of Y02 classifications
Judge agent — acting as the reasoning core of the Multi-Agent System (MAS) that labels 100 high-risk patent claims

Model Details

Developed by: Anders Sønderbý (as58zr@student.aau.dk)
Model type: Causal LLM with QLoRA adapter (PEFT)
Base model: mistralai/Mistral-7B-v0.1
Language: English
License: MIT
Task: Instruction-tuned binary classification — Green Technology (Y02) vs. Not Green

What This Model Does

Given the text of a patent claim formatted as an instruction prompt, the model completes the classification:

### Task: Classify the following patent claim as green technology (Y02) or not.

### Claim:
[patent claim text]

### Answer: YES / NO

In the MAS pipeline, the model is prompted as a Judge to produce structured JSON output weighing arguments from an Advocate and a Skeptic agent:

{"label": 0 or 1, "confidence": 0.0-1.0, "rationale": "2-3 sentence explanation"}

How to Load This Model

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel

BASE_MODEL = "mistralai/Mistral-7B-v0.1"
ADAPTER = "Anders-sonderby/mistral-7b-patent-qlora"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL)
tokenizer.pad_token = tokenizer.eos_token

base_model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    quantization_config=bnb_config,
    device_map="auto",
)
model = PeftModel.from_pretrained(base_model, ADAPTER)

Training Pipeline

Fine-Tuning Approach: QLoRA

Rather than updating all 7 billion parameters, QLoRA freezes the base model weights in 4-bit precision and injects small trainable LoRA adapter matrices into the attention layers. Only ~0.5% of parameters are trained, drastically reducing memory requirements while preserving model quality.

Training Data

Source: train_silver.csv — 40,000 patent claims with silver labels derived from CPC Y02* codes
Label balance: ~50/50 (20,010 green, 19,990 not green)
Prompt format: Instruction-tuning format (### Task / ### Claim / ### Answer: YES/NO)
Split: 38,000 train / 2,000 eval (5% held out)

QLoRA Configuration

Parameter	Value
Base model	mistralai/Mistral-7B-v0.1
Quantization	4-bit NF4 with double quantization
Compute dtype	bfloat16
LoRA rank (r)	16
LoRA alpha	32
Target modules	q_proj, v_proj
LoRA dropout	0.05
Trainable parameters	~0.5% of total

Training Hyperparameters

Parameter	Value
Epochs	1
Per device batch size	4
Gradient accumulation steps	4 (effective batch size: 16)
Learning rate	2e-4
LR scheduler	Cosine
Warmup steps	50
Precision	bf16
Hardware	NVIDIA L4 (24GB VRAM)
Cluster	AAU AI-Lab (SLURM batch job)

Evaluation Results (QLoRA Model)

Metric	Value
Eval loss	1.1804
Eval samples/second	10.8
Eval runtime	185s

Note: Eval loss reflects the causal language modelling objective on the held-out 2,000 samples. The model is evaluated on its ability to complete the ### Answer: token correctly.

Role in the Multi-Agent System (MAS)

This model acts as the Judge in a three-agent debate pipeline for labeling 100 high-risk patent claims:

Agent	Model	Role
Advocate	microsoft/Phi-3-mini-4k-instruct (4-bit)	Argues for Y02 green classification
Skeptic	microsoft/Phi-3-mini-4k-instruct (4-bit)	Challenges the classification, identifies greenwashing
Judge	Anders-sonderby/mistral-7b-patent-qlora	Weighs both arguments, produces final label + confidence + rationale

Claims where the Judge's confidence fell below 0.70 were flagged for targeted human review (Exception-Based HITL), reducing manual effort compared to reviewing all 100 claims.

Downstream Impact on PatentSBERTa

The gold labels produced by this MAS pipeline were used to fine-tune PatentSBERTa for the final model version:

Model Version	Training Data Source	F1 Score
1. Baseline	Frozen Embeddings (No Fine-tuning)	0.780
2. Assignment 2	Fine-tuned on Silver + Gold (Simple LLM)	0.818
3. Assignment 3	Fine-tuned on Silver + Gold (MAS - CrewAI)	0.824
4. Final Model	Fine-tuned on Silver + Gold (QLoRA MAS + Targeted HITL)	[Your F1 here]

Engineering Notes

The instruction format (### Answer: YES/NO) creates a tension when prompting the Judge to output structured JSON — the model was not trained to produce confidence scores, requiring careful prompt engineering and a JSON fallback parser
Two models (Phi-3-mini + Mistral QLoRA) were loaded simultaneously on a single L4 GPU using 4-bit quantization and sequential CUDA cache clearing to stay within 24GB VRAM
The QLoRA adapter must be loaded via PeftModel.from_pretrained() on top of the frozen base model — it cannot be loaded as a standalone model

Intended Use

Primary use: Academic research and coursework in patent classification
Intended users: Course instructors and students at Aalborg University
Out-of-scope: Production patent classification, legal patent assessment, or any commercial use

Limitations

Trained for 1 epoch only — additional epochs would likely improve classification accuracy
The YES/NO instruction format does not produce confidence scores natively, making structured JSON output fragile at inference time
The 512-character truncation may lose relevant technical context in longer patent claims
No chain-of-thought reasoning was included in training, limiting the depth of the model's rationale generation

Repository

The full code, notebooks, and data files for this assignment are available in the course GitHub repository.

Downloads last month: 1

Model tree for Anders-sonderby/mistral-7b-patent-qlora

Base model

mistralai/Mistral-7B-v0.1

Adapter

(2454)

this model