---
license: mit
datasets: llangnickel/long-covid-classification-data
---

## long-covid-classification
We fine-tuned bert-base-cased using a [manually curated dataset](https://huggingface.co/llangnickel/long-covid-classification-data) to train a Sequence Classification model able to distinguish between long COVID and non-long COVID-related documents. 

## Used hyper parameters
|Parameter|Value|
|---|---|  
|Learning rate|3e-5|
|Batch size|16|
|Number of epochs|4|
|Sequence Length|512|

## Metrics
|Precision [%]|Recall [%]|F1-score [%]|
|---|---|---|  
|91.18|91.18|91.18|

## How to load the model
```
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("llangnickel/long-covid-classification", use_auth_token=True)
label_dict = {0: "nonLongCOVID", 1: "longCOVID"}
model = AutoModelForSequenceClassification.from_pretrained("llangnickel/long-covid-classification", use_auth_token=True, num_labels=len(label_dict))
```

## Citation
@article{10.1093/database/baac048,  
    author = {Langnickel, Lisa and Darms, Johannes and Heldt, Katharina and Ducks, Denise and Fluck, Juliane},  
    title = "{Continuous development of the semantic search engine preVIEW: from COVID-19 to long COVID}",  
    journal = {Database},  
    volume = {2022},  
    year = {2022},  
    month = {07},  
    issn = {1758-0463},  
    doi = {10.1093/database/baac048},  
    url = {https://doi.org/10.1093/database/baac048},  
    note = {baac048},  
    eprint = {https://academic.oup.com/database/article-pdf/doi/10.1093/database/baac048/44371817/baac048.pdf},  
}