patentsbert-silver-gold-finetuned

PatentSBERTa fine-tuned for Y02 green technology patent classification using a combined Silver + Gold training set with a QLoRA-powered Multi-Agent System (MAS) and targeted Human-in-the-Loop (HITL) review.

Model

`patentsbert-silver-gold-finetuned/`

The fine-tuned PatentSBERTa model (sentence-transformers format). Trained with cosine similarity loss on contrastive pairs drawn from the combined Silver + Gold dataset. Use with SentenceTransformer("soysouce/patentsbert-silver-gold-finetuned").

Dataset Files

`patents_50k_green.parquet`

The full 50K patent dataset with splits: train_silver, eval_silver, and pool_unlabeled. Contains patent claim text and is_green_silver labels for the silver splits.

`pool_with_pseudo_labels.parquet`

The pool_unlabeled subset (30K claims) with pseudo labels generated by a Logistic Regression classifier (pseudo_label_lr) and uncertainty scores (uncertainty_lr). The top-100 highest-uncertainty claims were selected as the high-risk pool for MAS debate and HITL review.

`gold_dataset.parquet`

The final gold dataset: 100 high-risk claims with human-verified or MAS-judged labels (is_green_gold). Unreliable labels (token overflow errors, pipeline failures) are excluded. Source column indicates the label origin: judge_auto, human, or lr_fallback_skipped.

`gold_labels_human.json`

Raw HITL annotation output. Contains per-claim label, source, and confidence for all 100 high-risk claims. Produced by hitl_review.py after the interactive human review session.

`hitl_green_100.csv`

The 100 high-risk claims selected for HITL review, sorted by uncertainty score. Used as input to the MAS debate pipeline and the human review interface.

MAS Results (Part C)

`mas_labels.json`

Full MAS output for all 100 high-risk claims. Each record contains: patent_id, claim_text, final_label, confidence, y02_category, advocate_score, skeptic_score, and rationale.

`mas_summary.csv`

Tabular version of mas_labels.json with an additional true_label_lr column for comparison against the LR pseudo labels and uncertainty_lr for reference.

Logs

`mas_291757.out`

SLURM stdout log from the MAS pipeline job (job ID 291757). Contains full CrewAI verbose output including agent debates for all 100 claims.

`finetune_291940.out`

SLURM stdout log from the PatentSBERTa fine-tuning job (job ID 291940). Contains before/after F1 scores and training progress.

`finetune_291940.err`

SLURM stderr log from the fine-tuning job. Contains training loss per step and any warnings from the HuggingFace/sentence-transformers libraries.

Pipeline Overview

patents_50k_green.parquet
        │
        ├── train_silver (10K) ──────────────────────────────┐
        │                                                     │
        └── pool_unlabeled (30K)                             │
                │                                             │
                ▼ LR pseudo labels + uncertainty             │
        pool_with_pseudo_labels.parquet                      │
                │                                             │
                ▼ top-100 by uncertainty                     │
        hitl_green_100.csv                                   │
                │                                             │
                ▼ MAS debate (3x Qwen agents via CrewAI)     │
        mas_labels.json / mas_summary.csv                    │
                │                                             │
                ▼ HITL review (human labels deadlock claims) │
        gold_labels_human.json                               │
                │                                             │
                ▼ filter reliable labels                     │
        gold_dataset.parquet                                 │
                │                                             │
                └──── combined with train_silver ────────────┘
                                    │
                                    ▼
                    patentsbert-silver-gold-finetuned/

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

soysouce
/

patentsbert-silver-gold-finetuned

patentsbert-silver-gold-finetuned

Model