Llama-3.3-70B-Instruct-3d-500K-50K-0.1-reverse-padzero-plus-mul-sub-99-512D-1L-8H-2048I

This model is a fine-tuned version of meta-llama/Llama-3.3-70B-Instruct on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: nan

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 128
  • eval_batch_size: 128
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 5

Training results

Training Loss Epoch Step Validation Loss
No log 0 0 nan
0.0 0.1280 500 nan
0.0 0.2560 1000 nan
0.0 0.3839 1500 nan
0.0 0.5119 2000 nan
0.0 0.6399 2500 nan
0.0 0.7679 3000 nan
0.0 0.8958 3500 nan
0.0 1.0238 4000 nan
0.0 1.1518 4500 nan
0.0 1.2798 5000 nan
0.0 1.4077 5500 nan
0.0 1.5357 6000 nan
0.0 1.6637 6500 nan
0.0 1.7917 7000 nan
0.0 1.9196 7500 nan
0.0 2.0476 8000 nan
0.0 2.1756 8500 nan
0.0 2.3036 9000 nan
0.0 2.4315 9500 nan
0.0 2.5595 10000 nan
0.0 2.6875 10500 nan
0.0 2.8155 11000 nan
0.0 2.9434 11500 nan
0.0 3.0714 12000 nan
0.0 3.1994 12500 nan
0.0 3.3274 13000 nan
0.0 3.4553 13500 nan
0.0 3.5833 14000 nan
0.0 3.7113 14500 nan
0.0 3.8393 15000 nan
0.0 3.9672 15500 nan
0.0 4.0952 16000 nan
0.0 4.2232 16500 nan
0.0 4.3512 17000 nan
0.0 4.4791 17500 nan
0.0 4.6071 18000 nan
0.0 4.7351 18500 nan
0.0 4.8631 19000 nan
0.0 4.9910 19500 nan

Framework versions

  • Transformers 4.57.1
  • Pytorch 2.9.0+cu128
  • Datasets 4.5.0
  • Tokenizers 0.22.1
Downloads last month
2
Safetensors
Model size
5.27M params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for arithmetic-circuit-overloading/Llama-3.3-70B-Instruct-3d-500K-50K-0.1-reverse-padzero-plus-mul-sub-99-512D-1L-8H-2048I

Finetuned
(598)
this model