You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

gpt2_moe_eng_hom_1024_100mb_gelu_gpt2mlp

This model is a fine-tuned version of on the arrow dataset. It achieves the following results on the evaluation set:

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 2028
training_steps: 20287
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
7.4287	0.7394	500	6.4095
5.9201	1.4776	1000	5.8068
5.6458	2.2159	1500	5.4626
5.2332	2.9553	2000	5.1197
4.9917	3.6935	2500	4.8633
4.6972	4.4318	3000	4.6509
4.5694	5.1701	3500	4.4626
4.3806	5.9094	4000	4.3469
4.2555	6.6477	4500	4.2665
4.1236	7.3860	5000	4.2133
4.116	8.1242	5500	4.1599
4.0169	8.8636	6000	4.1266
3.931	9.6018	6500	4.1022
3.8484	10.3401	7000	4.0855
3.8573	11.0784	7500	4.0697
3.7843	11.8177	8000	4.0568
3.7087	12.5560	8500	4.0501
3.6309	13.2943	9000	4.0468
3.6717	14.0325	9500	4.0380
3.6127	14.7719	10000	4.0389
3.5379	15.5102	10500	4.0468
3.4938	16.2484	11000	4.0542
3.5153	16.9878	11500	4.0473
3.4621	17.7261	12000	4.0555
3.3845	18.4643	12500	4.0651
3.3756	19.2026	13000	4.0744
3.3898	19.9420	13500	4.0716
3.3365	20.6802	14000	4.0843
3.2856	21.4185	14500	4.0967
3.275	22.1567	15000	4.1069
3.2775	22.8961	15500	4.1053
3.2424	23.6344	16000	4.1166
3.2004	24.3726	16500	4.1274
3.199	25.1109	17000	4.1318
3.1838	25.8503	17500	4.1354
3.1578	26.5885	18000	4.1443
3.1366	27.3268	18500	4.1502
3.1317	28.0651	19000	4.1522
3.1143	28.8044	19500	4.1550
3.0978	29.5427	20000	4.1572

Safetensors

Model size

0.1B params

Tensor type

F32