llangnickel's picture
Update README.md
d914996
|
raw
history blame
1.56 kB
---
license: mit
---
## long-covid-classification
We fine-tuned bert-base-cased using a [manually curated dataset](https://huggingface.co/llangnickel/long-covid-classification-data) to train a Sequence Classification model able to distinguish between long COVID and non-long COVID-related documents.
## Used hyper parameters
|Parameter|Value|
|---|---|
|Learning rate|3e-5|
|Batch size|16|
|Number of epochs|4|
|Sequence Length|512|
## Metrics
|Precision [%]|Recall [%]|F1-score [%]|
|---|---|---|
|91.18|91.18|91.18|
## How to load the model
```
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("llangnickel/long-covid-classification", use_auth_token=True)
label_dict = {0: "nonLongCOVID", 1: "longCOVID"}
model = AutoModelForSequenceClassification.from_pretrained("llangnickel/long-covid-classification", use_auth_token=True, num_labels=len(label_dict))
```
## Citation
@article{10.1093/database/baac048,
author = {Langnickel, Lisa and Darms, Johannes and Heldt, Katharina and Ducks, Denise and Fluck, Juliane},
title = "{Continuous development of the semantic search engine preVIEW: from COVID-19 to long COVID}",
journal = {Database},
volume = {2022},
year = {2022},
month = {07},
issn = {1758-0463},
doi = {10.1093/database/baac048},
url = {https://doi.org/10.1093/database/baac048},
note = {baac048},
eprint = {https://academic.oup.com/database/article-pdf/doi/10.1093/database/baac048/44371817/baac048.pdf},
}