You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

gpt2_moe_eng_hom_1024_100mb_gelu_tok

This model is a fine-tuned version of on the arrow dataset. It achieves the following results on the evaluation set:

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 2028
training_steps: 20287
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
7.4924	0.7394	500	6.4575
5.9556	1.4776	1000	5.8571
5.7036	2.2159	1500	5.5890
5.3487	2.9553	2000	5.2755
5.0742	3.6935	2500	4.9725
4.7706	4.4318	3000	4.7761
4.6619	5.1701	3500	4.6313
4.4778	5.9094	4000	4.5206
4.3291	6.6477	4500	4.4375
4.1711	7.3860	5000	4.3799
4.1685	8.1242	5500	4.3354
4.041	8.8636	6000	4.2875
3.9162	9.6018	6500	4.2714
3.7992	10.3401	7000	4.2684
3.8207	11.0784	7500	4.2649
3.7179	11.8177	8000	4.2565
3.6016	12.5560	8500	4.2727
3.4873	13.2943	9000	4.2998
3.5476	14.0325	9500	4.3048
3.4522	14.7719	10000	4.3184
3.3295	15.5102	10500	4.3572
3.2574	16.2484	11000	4.3925
3.2943	16.9878	11500	4.3890
3.2057	17.7261	12000	4.4348
3.0769	18.4643	12500	4.4726
3.0552	19.2026	13000	4.5143
3.076	19.9420	13500	4.5183
2.988	20.6802	14000	4.5620
2.8988	21.4185	14500	4.6087
2.8785	22.1567	15000	4.6384
2.8767	22.8961	15500	4.6512
2.8122	23.6344	16000	4.6910
2.7481	24.3726	16500	4.7262
2.7381	25.1109	17000	4.7534
2.7092	25.8503	17500	4.7691
2.6606	26.5885	18000	4.7975
2.6327	27.3268	18500	4.8186
2.6189	28.0651	19000	4.8347
2.5879	28.8044	19500	4.8440
2.5604	29.5427	20000	4.8536

Safetensors

Model size

0.2B params

Tensor type

F32