Llama2-7B-lora-r-32-generic-step-1500-labels_40.0-full-precision-augmented
This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 2.3365
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1500
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 5.8433 | 0.0366 | 20 | 5.7143 |
| 5.5562 | 0.0731 | 40 | 5.4815 |
| 5.1434 | 0.1097 | 60 | 5.0645 |
| 4.5425 | 0.1463 | 80 | 4.5502 |
| 4.1153 | 0.1828 | 100 | 4.1151 |
| 3.8933 | 0.2194 | 120 | 3.8575 |
| 3.7068 | 0.2559 | 140 | 3.6844 |
| 3.57 | 0.2925 | 160 | 3.5517 |
| 3.4988 | 0.3291 | 180 | 3.4465 |
| 3.3511 | 0.3656 | 200 | 3.3515 |
| 3.2733 | 0.4022 | 220 | 3.2709 |
| 3.2212 | 0.4388 | 240 | 3.1988 |
| 3.1188 | 0.4753 | 260 | 3.1363 |
| 3.1026 | 0.5119 | 280 | 3.0801 |
| 2.9893 | 0.5484 | 300 | 3.0297 |
| 2.9535 | 0.5850 | 320 | 2.9848 |
| 2.995 | 0.6216 | 340 | 2.9392 |
| 2.9207 | 0.6581 | 360 | 2.8996 |
| 2.8496 | 0.6947 | 380 | 2.8631 |
| 2.7929 | 0.7313 | 400 | 2.8312 |
| 2.7841 | 0.7678 | 420 | 2.8002 |
| 2.7711 | 0.8044 | 440 | 2.7731 |
| 2.711 | 0.8410 | 460 | 2.7473 |
| 2.7237 | 0.8775 | 480 | 2.7243 |
| 2.724 | 0.9141 | 500 | 2.7001 |
| 2.6284 | 0.9506 | 520 | 2.6772 |
| 2.6865 | 0.9872 | 540 | 2.6561 |
| 2.648 | 1.0238 | 560 | 2.6362 |
| 2.5758 | 1.0603 | 580 | 2.6185 |
| 2.5672 | 1.0969 | 600 | 2.6007 |
| 2.5763 | 1.1335 | 620 | 2.5835 |
| 2.4662 | 1.1700 | 640 | 2.5673 |
| 2.573 | 1.2066 | 660 | 2.5525 |
| 2.4615 | 1.2431 | 680 | 2.5403 |
| 2.4942 | 1.2797 | 700 | 2.5264 |
| 2.4722 | 1.3163 | 720 | 2.5124 |
| 2.4466 | 1.3528 | 740 | 2.5010 |
| 2.4357 | 1.3894 | 760 | 2.4894 |
| 2.4257 | 1.4260 | 780 | 2.4791 |
| 2.4196 | 1.4625 | 800 | 2.4671 |
| 2.4175 | 1.4991 | 820 | 2.4572 |
| 2.3902 | 1.5356 | 840 | 2.4469 |
| 2.4661 | 1.5722 | 860 | 2.4383 |
| 2.3748 | 1.6088 | 880 | 2.4306 |
| 2.3579 | 1.6453 | 900 | 2.4226 |
| 2.3675 | 1.6819 | 920 | 2.4139 |
| 2.3416 | 1.7185 | 940 | 2.4060 |
| 2.3685 | 1.7550 | 960 | 2.3986 |
| 2.3803 | 1.7916 | 980 | 2.3925 |
| 2.4318 | 1.8282 | 1000 | 2.3862 |
| 2.3431 | 1.8647 | 1020 | 2.3812 |
| 2.3234 | 1.9013 | 1040 | 2.3766 |
| 2.3142 | 1.9378 | 1060 | 2.3715 |
| 2.3132 | 1.9744 | 1080 | 2.3673 |
| 2.3003 | 2.0110 | 1100 | 2.3630 |
| 2.3291 | 2.0475 | 1120 | 2.3592 |
| 2.3013 | 2.0841 | 1140 | 2.3561 |
| 2.2816 | 2.1207 | 1160 | 2.3535 |
| 2.2767 | 2.1572 | 1180 | 2.3514 |
| 2.2557 | 2.1938 | 1200 | 2.3487 |
| 2.2605 | 2.2303 | 1220 | 2.3468 |
| 2.5608 | 2.2669 | 1240 | 2.3449 |
| 2.2686 | 2.3035 | 1260 | 2.3432 |
| 2.2595 | 2.3400 | 1280 | 2.3417 |
| 2.3142 | 2.3766 | 1300 | 2.3405 |
| 2.2889 | 2.4132 | 1320 | 2.3394 |
| 2.2709 | 2.4497 | 1340 | 2.3387 |
| 2.2697 | 2.4863 | 1360 | 2.3380 |
| 2.2865 | 2.5229 | 1380 | 2.3375 |
| 2.2529 | 2.5594 | 1400 | 2.3372 |
| 2.3167 | 2.5960 | 1420 | 2.3368 |
| 2.296 | 2.6325 | 1440 | 2.3366 |
| 2.2454 | 2.6691 | 1460 | 2.3366 |
| 2.2741 | 2.7057 | 1480 | 2.3365 |
| 2.2529 | 2.7422 | 1500 | 2.3365 |
Framework versions
- PEFT 0.15.2
- Transformers 4.45.2
- Pytorch 2.5.0+cu121
- Datasets 3.2.0
- Tokenizers 0.20.3
- Downloads last month
- 1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for Siqi-Hu/Llama2-7B-lora-r-32-generic-step-1500-labels_40.0-full-precision-augmented
Base model
meta-llama/Llama-2-7b-hf