llangnickel
/

long-covid-classification

Text Classification

Model card Files Files and versions

long-covid-classification / README.md

llangnickel's picture

Update README.md

d914996 almost 4 years ago

|

1.56 kB

	---
	license: mit
	---

	## long-covid-classification
	We fine-tuned bert-base-cased using a [manually curated dataset](https://huggingface.co/llangnickel/long-covid-classification-data) to train a Sequence Classification model able to distinguish between long COVID and non-long COVID-related documents.

	## Used hyper parameters
	\|Parameter\|Value\|
	\|---\|---\|
	\|Learning rate\|3e-5\|
	\|Batch size\|16\|
	\|Number of epochs\|4\|
	\|Sequence Length\|512\|

	## Metrics
	\|Precision [%]\|Recall [%]\|F1-score [%]\|
	\|---\|---\|---\|
	\|91.18\|91.18\|91.18\|

	## How to load the model
	```
	from transformers import AutoTokenizer, AutoModelForSequenceClassification

	tokenizer = AutoTokenizer.from_pretrained("llangnickel/long-covid-classification", use_auth_token=True)
	label_dict = {0: "nonLongCOVID", 1: "longCOVID"}
	model = AutoModelForSequenceClassification.from_pretrained("llangnickel/long-covid-classification", use_auth_token=True, num_labels=len(label_dict))
	```

	## Citation
	@article{10.1093/database/baac048,
	author = {Langnickel, Lisa and Darms, Johannes and Heldt, Katharina and Ducks, Denise and Fluck, Juliane},
	title = "{Continuous development of the semantic search engine preVIEW: from COVID-19 to long COVID}",
	journal = {Database},
	volume = {2022},
	year = {2022},
	month = {07},
	issn = {1758-0463},
	doi = {10.1093/database/baac048},
	url = {https://doi.org/10.1093/database/baac048},
	note = {baac048},
	eprint = {https://academic.oup.com/database/article-pdf/doi/10.1093/database/baac048/44371817/baac048.pdf},
	}