You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

gpt2moe_hom2_100mb

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 7506
training_steps: 75067
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
No log	0	0	11.0818
6.854	0.2664	2000	6.5520
6.0315	0.5329	4000	5.7265
5.5806	0.7993	6000	5.2642
5.2507	1.0657	8000	4.9658
5.0517	1.3321	10000	4.7656
4.924	1.5985	12000	4.6236
4.8276	1.8650	14000	4.5289
4.6892	2.1313	16000	4.4570
4.6484	2.3978	18000	4.3995
4.6087	2.6642	20000	4.3493
4.5688	2.9306	22000	4.3070
4.469	3.1970	24000	4.2805
4.4547	3.4634	26000	4.2517
4.4454	3.7299	28000	4.2252
4.4201	3.9963	30000	4.1997
4.334	4.2627	32000	4.1863
4.3298	4.5291	34000	4.1689
4.3267	4.7956	36000	4.1511
4.22	5.0619	38000	4.1414
4.2395	5.3284	40000	4.1308
4.2375	5.5948	42000	4.1178
4.239	5.8612	44000	4.1024
4.1425	6.1276	46000	4.1030
4.1632	6.3940	48000	4.0945
4.1578	6.6605	50000	4.0836
4.1554	6.9269	52000	4.0719
4.0797	7.1933	54000	4.0745
4.0885	7.4597	56000	4.0674
4.091	7.7261	58000	4.0608
4.0897	7.9926	60000	4.0517
4.0219	8.2590	62000	4.0572
4.042	8.5254	64000	4.0528
4.0368	8.7918	66000	4.0463
3.9788	9.0582	68000	4.0463
3.9801	9.3246	70000	4.0450
3.9866	9.5911	72000	4.0415
3.9807	9.8575	74000	4.0394

Safetensors

Model size

0.2B params

Tensor type

F32