Checkpoint for paper MedReadMe: A Systematic Study for Fine-grained Sentence Readability in Medical Domain

Our paper is accepted by EMNLP 2024 main conference as an oral presentation. The paper is available at arXiv.

This is the best medical sentence readability model trained on our dataset. This checkpoint uses a standard HuggingFace sentence prediction model.

Please find more details in our repo.

Quickstart on medical sentence readability model

# pip install transformers==4.35.2 torch --upgrade
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

MODEL_ID = "chaojiang06/medreadme_medical_sentence_readability_prediction_CWI"
MAX_LEN  = 512

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForSequenceClassification.from_pretrained(
    MODEL_ID,
    trust_remote_code=True,
)
model.eval()

def score_sentences(sentences):
    enc = tokenizer(
        sentences,
        padding=True, truncation=True, max_length=MAX_LEN,
        return_tensors="pt"
    )
    with torch.no_grad():
        out = model(**enc).logits.squeeze(-1)  # shape: [batch]
    return out.tolist()

print(score_sentences([
    "Take one tablet by mouth twice daily after meals.",
    "The pathophysiological sequelae of dyslipidemia necessitate..."
]))

Below are automatically generated by the Huggingface library.

roberta-large+cwi.py+512+8+1e-5+1

This model is a fine-tuned version of roberta-large on the cwi dataset. It achieves the following results on the evaluation set:

Loss: 0.2137
Pearsonr: 0.8429
Addition Pearsonr: 0.8429
Addition Pearsonr Pvalue: 0.0000
Addition Spearmanr: 0.8297
Addition Spearmanr Pvalue: 0.0000

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 8
eval_batch_size: 8
seed: 1
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 10.0

Framework versions

Transformers 4.35.2
Pytorch 2.1.1+cu121
Datasets 2.15.0
Tokenizers 0.14.1

Downloads last month: 10

Safetensors

Model size

0.4B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for chaojiang06/medreadme_medical_sentence_readability_prediction_CWI

Base model

FacebookAI/roberta-large

Finetuned

(453)

this model

Paper for chaojiang06/medreadme_medical_sentence_readability_prediction_CWI

MedReadMe: A Systematic Study for Fine-grained Sentence Readability in Medical Domain

Paper • 2405.02144 • Published May 3, 2024

Evaluation results

Pearsonr on cwi
validation set self-reported

0.843