RoFormer: Enhanced Transformer with Rotary Position Embedding
Paper • 2104.09864 • Published • 17
This model is pretrained with Vietnamese language, based on GPT-NeoX which is a large language model developed by EleutherAI.
Trained on A100 40GB GPU and 48 core CPU. Took about 17 hours to reach 80,000 steps.
| Hyperparameter | Value |
|---|---|
| nparameters | 2670182400 |
| nlayers | 32 |
| dmodel | 2560 |
| nheads | 32 |
| dhead | 128 |
| nvocab | 60000 |
| Sequence Length | 2048 |
| Learning Rate | 0.00016 |
| Positional Encoding | Rotary Position Embedding (RoPE) |
The model can be loaded using the AutoModelForCausalLM functionality:
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("eunyounglee/GPT-NeoX-2.7B-Vietnamese-pretrained")
model = AutoModelForCausalLM.from_pretrained("eunyounglee/GPT-NeoX-2.7B-Vietnamese-pretrained")