Llama-3.3-70B-Instruct-v2-3d-2M-200K-0.1-reverse-padzero-99-128D-1L-2H-512I
This model is a fine-tuned version of meta-llama/Llama-3.3-70B-Instruct on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.4417
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 128
- eval_batch_size: 128
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 5
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| No log | 0 | 0 | 3.1621 |
| 1.9685 | 0.032 | 500 | 1.9346 |
| 1.8076 | 0.064 | 1000 | 1.8022 |
| 1.6821 | 0.096 | 1500 | 1.6713 |
| 1.6435 | 0.128 | 2000 | 1.6414 |
| 1.5828 | 0.16 | 2500 | 1.5671 |
| 1.5491 | 0.192 | 3000 | 1.5456 |
| 1.5287 | 0.224 | 3500 | 1.5313 |
| 1.5266 | 0.256 | 4000 | 1.5242 |
| 1.5182 | 0.288 | 4500 | 1.5195 |
| 1.5165 | 0.32 | 5000 | 1.5157 |
| 1.5136 | 0.352 | 5500 | 1.5126 |
| 1.5108 | 0.384 | 6000 | 1.5100 |
| 1.482 | 0.416 | 6500 | 1.4837 |
| 1.4831 | 0.448 | 7000 | 1.4801 |
| 1.4795 | 0.48 | 7500 | 1.4779 |
| 1.475 | 0.512 | 8000 | 1.4763 |
| 1.4759 | 0.544 | 8500 | 1.4732 |
| 1.4714 | 0.576 | 9000 | 1.4713 |
| 1.4696 | 0.608 | 9500 | 1.4714 |
| 1.4719 | 0.64 | 10000 | 1.4696 |
| 1.4679 | 0.672 | 10500 | 1.4692 |
| 1.4707 | 0.704 | 11000 | 1.4682 |
| 1.4692 | 0.736 | 11500 | 1.4675 |
| 1.4699 | 0.768 | 12000 | 1.4673 |
| 1.4619 | 0.8 | 12500 | 1.4657 |
| 1.4648 | 0.832 | 13000 | 1.4659 |
| 1.462 | 0.864 | 13500 | 1.4642 |
| 1.4627 | 0.896 | 14000 | 1.4630 |
| 1.4634 | 0.928 | 14500 | 1.4624 |
| 1.4606 | 0.96 | 15000 | 1.4617 |
| 1.4626 | 0.992 | 15500 | 1.4628 |
| 1.4625 | 1.024 | 16000 | 1.4626 |
| 1.4594 | 1.056 | 16500 | 1.4603 |
| 1.4626 | 1.088 | 17000 | 1.4613 |
| 1.4601 | 1.12 | 17500 | 1.4631 |
| 1.4595 | 1.152 | 18000 | 1.4598 |
| 1.459 | 1.184 | 18500 | 1.4594 |
| 1.459 | 1.216 | 19000 | 1.4587 |
| 1.4576 | 1.248 | 19500 | 1.4583 |
| 1.4584 | 1.28 | 20000 | 1.4563 |
| 1.4568 | 1.312 | 20500 | 1.4567 |
| 1.4575 | 1.3440 | 21000 | 1.4566 |
| 1.4563 | 1.376 | 21500 | 1.4565 |
| 1.4588 | 1.408 | 22000 | 1.4553 |
| 1.4545 | 1.44 | 22500 | 1.4565 |
| 1.4558 | 1.472 | 23000 | 1.4578 |
| 1.4573 | 1.504 | 23500 | 1.4559 |
| 1.454 | 1.536 | 24000 | 1.4543 |
| 1.4535 | 1.568 | 24500 | 1.4558 |
| 1.4559 | 1.6 | 25000 | 1.4528 |
| 1.4511 | 1.6320 | 25500 | 1.4531 |
| 1.4522 | 1.6640 | 26000 | 1.4526 |
| 1.4523 | 1.696 | 26500 | 1.4530 |
| 1.4512 | 1.728 | 27000 | 1.4517 |
| 1.4549 | 1.76 | 27500 | 1.4528 |
| 1.4558 | 1.792 | 28000 | 1.4505 |
| 1.4536 | 1.8240 | 28500 | 1.4523 |
| 1.4515 | 1.8560 | 29000 | 1.4506 |
| 1.4502 | 1.888 | 29500 | 1.4494 |
| 1.4495 | 1.92 | 30000 | 1.4508 |
| 1.4484 | 1.952 | 30500 | 1.4505 |
| 1.4509 | 1.984 | 31000 | 1.4493 |
| 1.4498 | 2.016 | 31500 | 1.4505 |
| 1.4489 | 2.048 | 32000 | 1.4487 |
| 1.4487 | 2.08 | 32500 | 1.4505 |
| 1.4486 | 2.112 | 33000 | 1.4478 |
| 1.4502 | 2.144 | 33500 | 1.4484 |
| 1.4492 | 2.176 | 34000 | 1.4488 |
| 1.4489 | 2.208 | 34500 | 1.4478 |
| 1.4487 | 2.24 | 35000 | 1.4480 |
| 1.4483 | 2.2720 | 35500 | 1.4474 |
| 1.4468 | 2.304 | 36000 | 1.4471 |
| 1.4491 | 2.336 | 36500 | 1.4472 |
| 1.4448 | 2.368 | 37000 | 1.4466 |
| 1.4459 | 2.4 | 37500 | 1.4467 |
| 1.4481 | 2.432 | 38000 | 1.4460 |
| 1.4478 | 2.464 | 38500 | 1.4492 |
| 1.4456 | 2.496 | 39000 | 1.4470 |
| 1.4475 | 2.528 | 39500 | 1.4457 |
| 1.4465 | 2.56 | 40000 | 1.4456 |
| 1.445 | 2.592 | 40500 | 1.4454 |
| 1.4449 | 2.624 | 41000 | 1.4457 |
| 1.4465 | 2.656 | 41500 | 1.4456 |
| 1.4459 | 2.6880 | 42000 | 1.4450 |
| 1.4472 | 2.7200 | 42500 | 1.4449 |
| 1.4458 | 2.752 | 43000 | 1.4453 |
| 1.4463 | 2.784 | 43500 | 1.4446 |
| 1.4426 | 2.816 | 44000 | 1.4451 |
| 1.4454 | 2.848 | 44500 | 1.4448 |
| 1.4447 | 2.88 | 45000 | 1.4444 |
| 1.4439 | 2.912 | 45500 | 1.4445 |
| 1.4443 | 2.944 | 46000 | 1.4442 |
| 1.445 | 2.976 | 46500 | 1.4438 |
| 1.4419 | 3.008 | 47000 | 1.4437 |
| 1.4421 | 3.04 | 47500 | 1.4440 |
| 1.4408 | 3.072 | 48000 | 1.4435 |
| 1.4426 | 3.104 | 48500 | 1.4434 |
| 1.4422 | 3.136 | 49000 | 1.4434 |
| 1.4431 | 3.168 | 49500 | 1.4432 |
| 1.4458 | 3.2 | 50000 | 1.4432 |
| 1.4421 | 3.232 | 50500 | 1.4430 |
| 1.4455 | 3.2640 | 51000 | 1.4431 |
| 1.4425 | 3.296 | 51500 | 1.4427 |
| 1.4435 | 3.328 | 52000 | 1.4431 |
| 1.4427 | 3.36 | 52500 | 1.4431 |
| 1.4427 | 3.392 | 53000 | 1.4429 |
| 1.4414 | 3.424 | 53500 | 1.4429 |
| 1.4439 | 3.456 | 54000 | 1.4429 |
| 1.443 | 3.488 | 54500 | 1.4426 |
| 1.4413 | 3.52 | 55000 | 1.4426 |
| 1.4423 | 3.552 | 55500 | 1.4425 |
| 1.4404 | 3.584 | 56000 | 1.4423 |
| 1.4433 | 3.616 | 56500 | 1.4424 |
| 1.4418 | 3.648 | 57000 | 1.4423 |
| 1.4444 | 3.68 | 57500 | 1.4422 |
| 1.4396 | 3.7120 | 58000 | 1.4422 |
| 1.4427 | 3.7440 | 58500 | 1.4422 |
| 1.4423 | 3.776 | 59000 | 1.4421 |
| 1.4432 | 3.808 | 59500 | 1.4420 |
| 1.4421 | 3.84 | 60000 | 1.4420 |
| 1.4437 | 3.872 | 60500 | 1.4420 |
| 1.4412 | 3.904 | 61000 | 1.4419 |
| 1.4405 | 3.936 | 61500 | 1.4419 |
| 1.4418 | 3.968 | 62000 | 1.4419 |
| 1.4414 | 4.0 | 62500 | 1.4418 |
| 1.4418 | 4.032 | 63000 | 1.4418 |
| 1.442 | 4.064 | 63500 | 1.4419 |
| 1.439 | 4.096 | 64000 | 1.4418 |
| 1.4431 | 4.128 | 64500 | 1.4418 |
| 1.4391 | 4.16 | 65000 | 1.4418 |
| 1.4413 | 4.192 | 65500 | 1.4418 |
| 1.4408 | 4.224 | 66000 | 1.4418 |
| 1.443 | 4.256 | 66500 | 1.4417 |
| 1.4441 | 4.288 | 67000 | 1.4417 |
| 1.4431 | 4.32 | 67500 | 1.4417 |
| 1.4417 | 4.352 | 68000 | 1.4417 |
| 1.4419 | 4.384 | 68500 | 1.4417 |
| 1.4404 | 4.416 | 69000 | 1.4417 |
| 1.4418 | 4.448 | 69500 | 1.4417 |
| 1.4402 | 4.48 | 70000 | 1.4417 |
| 1.4397 | 4.5120 | 70500 | 1.4417 |
| 1.4395 | 4.5440 | 71000 | 1.4417 |
| 1.4398 | 4.576 | 71500 | 1.4417 |
| 1.4395 | 4.608 | 72000 | 1.4417 |
| 1.4413 | 4.64 | 72500 | 1.4417 |
| 1.4388 | 4.672 | 73000 | 1.4417 |
| 1.4436 | 4.704 | 73500 | 1.4417 |
| 1.4427 | 4.736 | 74000 | 1.4417 |
| 1.4413 | 4.768 | 74500 | 1.4417 |
| 1.4393 | 4.8 | 75000 | 1.4417 |
| 1.4417 | 4.832 | 75500 | 1.4417 |
| 1.445 | 4.864 | 76000 | 1.4417 |
| 1.4417 | 4.896 | 76500 | 1.4417 |
| 1.4407 | 4.928 | 77000 | 1.4417 |
| 1.4416 | 4.96 | 77500 | 1.4417 |
| 1.4414 | 4.992 | 78000 | 1.4417 |
Framework versions
- Transformers 4.57.1
- Pytorch 2.9.0+cu128
- Datasets 4.5.0
- Tokenizers 0.22.1
- Downloads last month
- 1,587
Model tree for arithmetic-circuit-overloading/Llama-3.3-70B-Instruct-v2-3d-2M-200K-0.1-reverse-padzero-99-128D-1L-2H-512I
Base model
meta-llama/Llama-3.1-70B Finetuned
meta-llama/Llama-3.3-70B-Instruct