--- library_name: transformers tags: - generated_from_trainer model-index: - name: TBD-LLaMA-500M-Final-Direction-500M results: [] --- # TBD-LLaMA-500M-Final-Direction-500M This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 3.6898 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0001 - train_batch_size: 1 - eval_batch_size: 1 - seed: 42 - distributed_type: multi-GPU - num_devices: 4 - gradient_accumulation_steps: 16 - total_train_batch_size: 64 - total_eval_batch_size: 4 - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 173 - training_steps: 17360 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:------:|:-----:|:---------------:| | 7.1404 | 0.0115 | 200 | 7.0388 | | 6.6918 | 0.0230 | 400 | 6.6803 | | 6.6408 | 0.0346 | 600 | 6.6217 | | 6.5923 | 0.0461 | 800 | 6.5769 | | 6.561 | 0.0576 | 1000 | 6.5328 | | 6.4665 | 0.0691 | 1200 | 6.4522 | | 6.2754 | 0.0806 | 1400 | 6.2240 | | 5.7839 | 0.0922 | 1600 | 5.6270 | | 5.161 | 0.1037 | 1800 | 4.9428 | | 4.78 | 0.1152 | 2000 | 4.6158 | | 4.6603 | 0.1267 | 2200 | 4.4470 | | 4.554 | 0.1382 | 2400 | 4.3597 | | 4.4901 | 0.1498 | 2600 | 4.3011 | | 4.416 | 0.1613 | 2800 | 4.2596 | | 4.3742 | 0.1728 | 3000 | 4.2117 | | 4.2719 | 0.1843 | 3200 | 4.1822 | | 4.2189 | 0.1959 | 3400 | 4.1528 | | 4.2023 | 0.2074 | 3600 | 4.1241 | | 4.2014 | 0.2189 | 3800 | 4.1053 | | 4.1731 | 0.2304 | 4000 | 4.0840 | | 4.1578 | 0.2419 | 4200 | 4.0602 | | 4.1387 | 0.2535 | 4400 | 4.0537 | | 4.0884 | 0.2650 | 4600 | 4.0289 | | 4.0962 | 0.2765 | 4800 | 4.0129 | | 4.0141 | 0.2880 | 5000 | 4.0047 | | 4.0292 | 0.2995 | 5200 | 3.9874 | | 4.0243 | 0.3111 | 5400 | 3.9742 | | 4.001 | 0.3226 | 5600 | 3.9644 | | 3.9626 | 0.3341 | 5800 | 3.9509 | | 3.9376 | 0.3456 | 6000 | 3.9434 | | 3.9762 | 0.3571 | 6200 | 3.9331 | | 4.0447 | 0.3687 | 6400 | 3.9221 | | 3.977 | 0.3802 | 6600 | 3.9098 | | 4.0106 | 0.3917 | 6800 | 3.9016 | | 3.9686 | 0.4032 | 7000 | 3.8928 | | 3.9114 | 0.4147 | 7200 | 3.8835 | | 3.9024 | 0.4263 | 7400 | 3.8755 | | 3.9965 | 0.4378 | 7600 | 3.8659 | | 4.0031 | 0.4493 | 7800 | 3.8594 | | 3.9794 | 0.4608 | 8000 | 3.8530 | | 3.855 | 0.4723 | 8200 | 3.8455 | | 3.8848 | 0.4839 | 8400 | 3.8365 | | 3.8435 | 0.4954 | 8600 | 3.8292 | | 3.9157 | 0.5069 | 8800 | 3.8207 | | 3.938 | 0.5184 | 9000 | 3.8147 | | 3.8188 | 0.5299 | 9200 | 3.8088 | | 3.864 | 0.5415 | 9400 | 3.8125 | | 3.8439 | 0.5530 | 9600 | 3.7972 | | 3.8419 | 0.5645 | 9800 | 3.7906 | | 3.8761 | 0.5760 | 10000 | 3.7852 | | 3.7693 | 0.5876 | 10200 | 3.7789 | | 3.8506 | 0.5991 | 10400 | 3.7734 | | 3.8403 | 0.6106 | 10600 | 3.7687 | | 3.8663 | 0.6221 | 10800 | 3.7635 | | 3.7548 | 0.6336 | 11000 | 3.7597 | | 3.9174 | 0.6452 | 11200 | 3.7538 | | 3.8308 | 0.6567 | 11400 | 3.7486 | | 3.7601 | 0.6682 | 11600 | 3.7452 | | 3.8296 | 0.6797 | 11800 | 3.7421 | | 3.7379 | 0.6912 | 12000 | 3.7375 | | 3.8726 | 0.7028 | 12200 | 3.7332 | | 3.8376 | 0.7143 | 12400 | 3.7298 | | 3.8514 | 0.7258 | 12600 | 3.7260 | | 3.7554 | 0.7373 | 12800 | 3.7229 | | 3.7744 | 0.7488 | 13000 | 3.7196 | | 3.7656 | 0.7604 | 13200 | 3.7161 | | 3.7097 | 0.7719 | 13400 | 3.7140 | | 3.7673 | 0.7834 | 13600 | 3.7113 | | 3.81 | 0.7949 | 13800 | 3.7089 | | 3.8687 | 0.8064 | 14000 | 3.7062 | | 3.7848 | 0.8180 | 14200 | 3.7043 | | 3.7425 | 0.8295 | 14400 | 3.7021 | | 3.7567 | 0.8410 | 14600 | 3.7000 | | 3.7133 | 0.8525 | 14800 | 3.6985 | | 3.7089 | 0.8640 | 15000 | 3.6972 | | 3.7652 | 0.8756 | 15200 | 3.6954 | | 3.764 | 0.8871 | 15400 | 3.6941 | | 3.7658 | 0.8986 | 15600 | 3.6933 | | 3.6308 | 0.9101 | 15800 | 3.6922 | | 3.5539 | 0.9216 | 16000 | 3.6916 | | 3.714 | 0.9332 | 16200 | 3.6910 | | 3.7669 | 0.9447 | 16400 | 3.6904 | | 3.7044 | 0.9562 | 16600 | 3.6901 | | 3.6267 | 0.9677 | 16800 | 3.6899 | | 3.8177 | 0.9793 | 17000 | 3.6897 | | 3.8063 | 0.9908 | 17200 | 3.6898 | ### Framework versions - Transformers 4.56.1 - Pytorch 2.8.0a0+5228986c39.nv25.05 - Datasets 4.0.0 - Tokenizers 0.22.0