| --- |
| language: |
| - he |
| datasets: |
| - HeNLP/HeDC4 |
| --- |
| |
| ## Hebrew Language Model |
|
|
| State-of-the-art RoBERTa language model for Hebrew. |
|
|
| #### How to use |
|
|
| ```python |
| from transformers import AutoModelForMaskedLM, AutoTokenizer |
| |
| tokenizer = AutoTokenizer.from_pretrained('HeNLP/HeRo') |
| model = AutoModelForMaskedLM.from_pretrained('HeNLP/HeRo' |
| |
| # Tokenization Example: |
| # Tokenizing |
| tokenized_string = tokenizer('ืฉืืื ืืืืื') |
| |
| # Decoding |
| decoded_string = tokenizer.decode(tokenized_string ['input_ids'], skip_special_tokens=True) |
| ``` |
|
|
| ### Citing |
|
|
| If you use HeRo in your research, please cite [HeRo: RoBERTa and Longformer Hebrew Language Models](http://arxiv.org/abs/2304.11077). |
| ``` |
| @article{shalumov2023hero, |
| title={HeRo: RoBERTa and Longformer Hebrew Language Models}, |
| author={Vitaly Shalumov and Harel Haskey}, |
| year={2023}, |
| journal={arXiv:2304.11077}, |
| } |
| ``` |
|
|