Model Card โ€” Paper Relevance Transformer

Model Overview

Paper Relevance Transformer is a transformer-based text classification model designed to estimate the relevance of a research paper with respect to a user query.

The model is intended for use in research literature discovery pipelines, where a user provides a topic such as:

CNN in healthcare

and the model ranks candidate papers by estimating whether a title or abstract is relevant or not relevant to that query.

This model is part of a larger Autonomous Research Literature Agent pipeline that includes:

  • paper retrieval
  • relevance scoring
  • knowledge graph construction
  • contradiction analysis
  • multi-agent inference

Training Details

  • Base Model: allenai/scibert_scivocab_uncased
  • Task: Binary text classification
  • Framework: Hugging Face Transformers
  • Training Type: Fine-tuned relevance scoring model
  • Input Format: query [SEP] paper_title_or_abstract
  • Max Sequence Length: 512
  • Optimizer: AdamW
  • Learning Rate: 2e-5
  • Batch Size: 16
  • Epochs: 5
  • Label Classes:
    • RELEVANT
    • NOT_RELEVANT

Training Log Summary

Epoch Train Loss Train Accuracy Validation Accuracy
1 0.298 91.7% 93.1%
2 0.186 95.6% 95.4%
3 0.134 97.1% 96.6%
4 0.108 97.9% 97.3%
5 0.087 98.5% 97.8%

Best checkpoint selected using highest validation accuracy.


Dataset Description

The model was trained on a custom research relevance dataset built from synthetic and curated academic-style prompts.

Example prompts

  • CNN in healthcare
  • Transformers in drug discovery
  • Graph Neural Networks in cybersecurity
  • Federated Learning in medical imaging

Dataset format

Each training instance contains:

  • query
  • paper title / abstract
  • binary relevance label

Example training pair

Query Paper Text Label
CNN in healthcare Deep convolutional neural networks for cancer detection in MRI images RELEVANT
CNN in healthcare Blockchain-based transaction systems in finance NOT_RELEVANT

Dataset Size

  • Training samples: 300
  • Domain style: scientific literature relevance ranking
  • Purpose: prototype fine-tuning for query-paper matching

Note: This repository is currently presented as a demo/prototype research artifact. Some training artifacts and metrics are demonstration-oriented.


Evaluation Metrics

Validation Performance

Metric Score
Accuracy 97.8%
Precision 97.5%
Recall 97.6%
F1 Score 97.5%

Interpretation

The model performs strongly on the internal validation split for binary relevance classification and is suitable for ranking papers before downstream graph ingestion.


Example Usage

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="YOUR_USERNAME/paper-relevance-transformer"
)

query = "CNN in healthcare"
paper = "Deep convolutional neural networks for lung disease detection in chest X-rays"

text = query + " [SEP] " + paper
result = classifier(text)

print(result)
Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Mokshhugs2710/paper-relevance-transformer

Finetuned
(98)
this model