Llama-3.3-70B-Instruct-3d-1M-100K-0.1-reverse-padzero-plus-mul-sub-99-512D-3L-4H-2048I

This model is a fine-tuned version of meta-llama/Llama-3.3-70B-Instruct on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: nan

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 128
  • eval_batch_size: 128
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 5

Training results

Training Loss Epoch Step Validation Loss
No log 0 0 nan
0.0 0.0640 500 nan
0.0 0.1280 1000 nan
0.0 0.1920 1500 nan
0.0 0.2560 2000 nan
0.0 0.3200 2500 nan
0.0 0.3840 3000 nan
0.0 0.4480 3500 nan
0.0 0.5120 4000 nan
0.0 0.5760 4500 nan
0.0 0.6400 5000 nan
0.0 0.7040 5500 nan
0.0 0.7680 6000 nan
0.0 0.8319 6500 nan
0.0 0.8959 7000 nan
0.0 0.9599 7500 nan
0.0 1.0239 8000 nan
0.0 1.0879 8500 nan
0.0 1.1519 9000 nan
0.0 1.2159 9500 nan
0.0 1.2799 10000 nan
0.0 1.3439 10500 nan
0.0 1.4079 11000 nan
0.0 1.4719 11500 nan
0.0 1.5359 12000 nan
0.0 1.5999 12500 nan
0.0 1.6639 13000 nan
0.0 1.7279 13500 nan
0.0 1.7919 14000 nan
0.0 1.8559 14500 nan
0.0 1.9199 15000 nan
0.0 1.9839 15500 nan
0.0 2.0479 16000 nan
0.0 2.1119 16500 nan
0.0 2.1759 17000 nan
0.0 2.2399 17500 nan
0.0 2.3039 18000 nan
0.0 2.3678 18500 nan
0.0 2.4318 19000 nan
0.0 2.4958 19500 nan
0.0 2.5598 20000 nan
0.0 2.6238 20500 nan
0.0 2.6878 21000 nan
0.0 2.7518 21500 nan
0.0 2.8158 22000 nan
0.0 2.8798 22500 nan
0.0 2.9438 23000 nan
0.0 3.0078 23500 nan
0.0 3.0718 24000 nan
0.0 3.1358 24500 nan
0.0 3.1998 25000 nan
0.0 3.2638 25500 nan
0.0 3.3278 26000 nan
0.0 3.3918 26500 nan
0.0 3.4558 27000 nan
0.0 3.5198 27500 nan
0.0 3.5838 28000 nan
0.0 3.6478 28500 nan
0.0 3.7118 29000 nan
0.0 3.7758 29500 nan
0.0 3.8398 30000 nan
0.0 3.9038 30500 nan
0.0 3.9677 31000 nan
0.0 4.0317 31500 nan
0.0 4.0957 32000 nan
0.0 4.1597 32500 nan
0.0 4.2237 33000 nan
0.0 4.2877 33500 nan
0.0 4.3517 34000 nan
0.0 4.4157 34500 nan
0.0 4.4797 35000 nan
0.0 4.5437 35500 nan
0.0 4.6077 36000 nan
0.0 4.6717 36500 nan
0.0 4.7357 37000 nan
0.0 4.7997 37500 nan
0.0 4.8637 38000 nan
0.0 4.9277 38500 nan
0.0 4.9917 39000 nan

Framework versions

  • Transformers 4.57.1
  • Pytorch 2.9.0+cu128
  • Datasets 4.5.0
  • Tokenizers 0.22.1
Downloads last month
2
Safetensors
Model size
12.6M params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for arithmetic-circuit-overloading/Llama-3.3-70B-Instruct-3d-1M-100K-0.1-reverse-padzero-plus-mul-sub-99-512D-3L-4H-2048I

Finetuned
(598)
this model