--- license: mit datasets: llangnickel/long-covid-classification-data --- ## long-covid-classification We fine-tuned bert-base-cased using a [manually curated dataset](https://huggingface.co/llangnickel/long-covid-classification-data) to train a Sequence Classification model able to distinguish between long COVID and non-long COVID-related documents. ## Used hyper parameters |Parameter|Value| |---|---| |Learning rate|3e-5| |Batch size|16| |Number of epochs|4| |Sequence Length|512| ## Metrics |Precision [%]|Recall [%]|F1-score [%]| |---|---|---| |91.18|91.18|91.18| ## How to load the model ``` from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("llangnickel/long-covid-classification", use_auth_token=True) label_dict = {0: "nonLongCOVID", 1: "longCOVID"} model = AutoModelForSequenceClassification.from_pretrained("llangnickel/long-covid-classification", use_auth_token=True, num_labels=len(label_dict)) ``` ## Citation @article{10.1093/database/baac048, author = {Langnickel, Lisa and Darms, Johannes and Heldt, Katharina and Ducks, Denise and Fluck, Juliane}, title = "{Continuous development of the semantic search engine preVIEW: from COVID-19 to long COVID}", journal = {Database}, volume = {2022}, year = {2022}, month = {07}, issn = {1758-0463}, doi = {10.1093/database/baac048}, url = {https://doi.org/10.1093/database/baac048}, note = {baac048}, eprint = {https://academic.oup.com/database/article-pdf/doi/10.1093/database/baac048/44371817/baac048.pdf}, }