arithmetic-circuit-overloading
/

Llama-3.3-70B-Instruct-3d-500K-50K-0.2-reverse-padzero-plus-mul-sub-99-64D-2L-2H-256I

Llama-3.3-70B-Instruct-3d-500K-50K-0.2-reverse-padzero-plus-mul-sub-99-64D-2L-2H-256I

This model is a fine-tuned version of meta-llama/Llama-3.3-70B-Instruct on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.3545

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 128
eval_batch_size: 128
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.05
num_epochs: 5

Training results

Training Loss	Epoch	Step	Validation Loss
No log	0	0	3.0226
1.813	0.1280	500	1.7818
1.6833	0.2560	1000	1.6576
1.4722	0.3839	1500	1.4611
1.4297	0.5119	2000	1.4310
1.4178	0.6399	2500	1.4234
1.4112	0.7679	3000	1.4089
1.4019	0.8958	3500	1.4045
1.399	1.0238	4000	1.3978
1.3969	1.1518	4500	1.3968
1.3905	1.2798	5000	1.3896
1.3892	1.4077	5500	1.3884
1.3845	1.5357	6000	1.3838
1.3831	1.6637	6500	1.3836
1.379	1.7917	7000	1.3772
1.3753	1.9196	7500	1.3773
1.3765	2.0476	8000	1.3767
1.3719	2.1756	8500	1.3727
1.3715	2.3036	9000	1.3727
1.3693	2.4315	9500	1.3698
1.3669	2.5595	10000	1.3682
1.368	2.6875	10500	1.3651
1.3629	2.8155	11000	1.3635
1.3619	2.9434	11500	1.3620
1.3594	3.0714	12000	1.3602
1.3584	3.1994	12500	1.3589
1.3585	3.3274	13000	1.3578
1.3557	3.4553	13500	1.3565
1.3562	3.5833	14000	1.3559
1.3555	3.7113	14500	1.3556
1.3546	3.8393	15000	1.3550
1.3542	3.9672	15500	1.3548
1.3546	4.0952	16000	1.3546
1.355	4.2232	16500	1.3546
1.3536	4.3512	17000	1.3545
1.3537	4.4791	17500	1.3545
1.3541	4.6071	18000	1.3545
1.354	4.7351	18500	1.3545
1.3536	4.8631	19000	1.3545
1.3534	4.9910	19500	1.3545

Framework versions

Transformers 4.57.1
Pytorch 2.9.0+cu128
Datasets 4.5.0
Tokenizers 0.22.1

Downloads last month: 8

Safetensors

Model size

232k params

Tensor type

BF16

Model tree for arithmetic-circuit-overloading/Llama-3.3-70B-Instruct-3d-500K-50K-0.2-reverse-padzero-plus-mul-sub-99-64D-2L-2H-256I

Base model

meta-llama/Llama-3.1-70B

Finetuned

meta-llama/Llama-3.3-70B-Instruct

Finetuned

(598)

this model