You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

dense_eng_hom_100m_mult_reseg_ep20_spm

This model is a fine-tuned version of on the arrow dataset. It achieves the following results on the evaluation set:

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8
eval_batch_size: 32
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 1352
training_steps: 13525
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
7.4746	0.7394	500	6.5613
5.8724	1.4776	1000	5.7328
5.4524	2.2159	1500	5.2701
5.0152	2.9553	2000	4.9518
4.7695	3.6935	2500	4.7523
4.5097	4.4318	3000	4.6241
4.4453	5.1701	3500	4.5447
4.282	5.9094	4000	4.4772
4.1115	6.6477	4500	4.4538
3.9488	7.3860	5000	4.4516
3.9684	8.1242	5500	4.4631
3.8292	8.8636	6000	4.4464
3.6565	9.6018	6500	4.4896
3.5011	10.3401	7000	4.5425
3.5573	11.0784	7500	4.5800
3.4257	11.8177	8000	4.5976
3.2619	12.5560	8500	4.6698
3.1353	13.2943	9000	4.7375
3.1905	14.0325	9500	4.7786
3.0776	14.7719	10000	4.8166
2.9398	15.5102	10500	4.8848
2.874	16.2484	11000	4.9440
2.8896	16.9878	11500	4.9569
2.8135	17.7261	12000	5.0176
2.7288	18.4643	12500	5.0580
2.7072	19.2026	13000	5.0837
2.6928	19.9420	13500	5.0898

Safetensors

Model size

0.2B params

Tensor type

F32