patentsbert-silver-gold-finetuned

PatentSBERTa fine-tuned for Y02 green technology patent classification using a combined Silver + Gold training set with a QLoRA-powered Multi-Agent System (MAS) and targeted Human-in-the-Loop (HITL) review.


Model

patentsbert-silver-gold-finetuned/

The fine-tuned PatentSBERTa model (sentence-transformers format). Trained with cosine similarity loss on contrastive pairs drawn from the combined Silver + Gold dataset. Use with SentenceTransformer("soysouce/patentsbert-silver-gold-finetuned").


Dataset Files

patents_50k_green.parquet

The full 50K patent dataset with splits: train_silver, eval_silver, and pool_unlabeled. Contains patent claim text and is_green_silver labels for the silver splits.

pool_with_pseudo_labels.parquet

The pool_unlabeled subset (30K claims) with pseudo labels generated by a Logistic Regression classifier (pseudo_label_lr) and uncertainty scores (uncertainty_lr). The top-100 highest-uncertainty claims were selected as the high-risk pool for MAS debate and HITL review.

gold_dataset.parquet

The final gold dataset: 100 high-risk claims with human-verified or MAS-judged labels (is_green_gold). Unreliable labels (token overflow errors, pipeline failures) are excluded. Source column indicates the label origin: judge_auto, human, or lr_fallback_skipped.

gold_labels_human.json

Raw HITL annotation output. Contains per-claim label, source, and confidence for all 100 high-risk claims. Produced by hitl_review.py after the interactive human review session.

hitl_green_100.csv

The 100 high-risk claims selected for HITL review, sorted by uncertainty score. Used as input to the MAS debate pipeline and the human review interface.


MAS Results (Part C)

mas_labels.json

Full MAS output for all 100 high-risk claims. Each record contains: patent_id, claim_text, final_label, confidence, y02_category, advocate_score, skeptic_score, and rationale.

mas_summary.csv

Tabular version of mas_labels.json with an additional true_label_lr column for comparison against the LR pseudo labels and uncertainty_lr for reference.


Logs

mas_291757.out

SLURM stdout log from the MAS pipeline job (job ID 291757). Contains full CrewAI verbose output including agent debates for all 100 claims.

finetune_291940.out

SLURM stdout log from the PatentSBERTa fine-tuning job (job ID 291940). Contains before/after F1 scores and training progress.

finetune_291940.err

SLURM stderr log from the fine-tuning job. Contains training loss per step and any warnings from the HuggingFace/sentence-transformers libraries.


Pipeline Overview

patents_50k_green.parquet
        β”‚
        β”œβ”€β”€ train_silver (10K) ──────────────────────────────┐
        β”‚                                                     β”‚
        └── pool_unlabeled (30K)                             β”‚
                β”‚                                             β”‚
                β–Ό LR pseudo labels + uncertainty             β”‚
        pool_with_pseudo_labels.parquet                      β”‚
                β”‚                                             β”‚
                β–Ό top-100 by uncertainty                     β”‚
        hitl_green_100.csv                                   β”‚
                β”‚                                             β”‚
                β–Ό MAS debate (3x Qwen agents via CrewAI)     β”‚
        mas_labels.json / mas_summary.csv                    β”‚
                β”‚                                             β”‚
                β–Ό HITL review (human labels deadlock claims) β”‚
        gold_labels_human.json                               β”‚
                β”‚                                             β”‚
                β–Ό filter reliable labels                     β”‚
        gold_dataset.parquet                                 β”‚
                β”‚                                             β”‚
                └──── combined with train_silver β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                    β”‚
                                    β–Ό
                    patentsbert-silver-gold-finetuned/
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using soysouce/patentsbert-silver-gold-finetuned 1