--- library_name: transformers license: apache-2.0 language: - en datasets: - howey/unarXive - howey/wiki_en - howey/hupd --- # Model Weights Comming Soon! ## Using HDT To use the pre-trained model for [UL2](https://arxiv.org/abs/2205.05131), use the following snippet: ```python from transformers import AutoModelForSeq2SeqLM, AutoTokenizer # See the `MDLM` collection page on the hub for list of available models. tokenizer = transformers.AutoTokenizer.from_pretrained('howey/HDT-ED') model_name = 'howey/HDT-ED' model = AutoModelForSeq2SeqLM.from_pretrained(model_name) ``` For more details, please see our github repository: [HDT](https://github.com/autonomousvision/hdt) ## Model Details The model, which has a context length of `8192` and is similar in size to BERT with approximately `110M` parameters, was trained on standard UL2 task with a Transformer-based architecture using our proposed hierarchical attention. The training regimen comprised 72 hours on the ArXiv+Wikipedia+HUPD corpus, involving the processing of a total of `2.6 billion` tokens. For more details, please see our paper: [HDT: Hierarchical Document Transformer](https://arxiv.org/pdf/2407.08330). ## Citation Please cite our work using the bibtex below: **BibTeX:** ``` @inproceedings{He2024COLM, title={HDT: Hierarchical Document Transformer}, author={Haoyu He and Markus Flicke and Jan Buchmann and Iryna Gurevych and Andreas Geiger}, year={2024}, booktitle={Conference on Language Modeling} } ``` ## Model Card Contact Haoyu (haoyu.he@uni-tuebingen.de)