You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

llama3.2-3B-added-tokens-wiki-cursor-cosine-loss

This model is a fine-tuned version of rocker417/llama3.2-3B-added-tokens on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.7422

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Use adamw_8bit with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.03
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss
2.6565 0.0356 1000 2.6107
2.1271 0.0711 2000 2.2392
1.8366 0.1067 3000 2.0953
1.8707 0.1422 4000 2.0126
1.8575 0.1778 5000 1.9535
1.7122 0.2133 6000 1.9188
1.6415 0.2489 7000 1.8861
1.7459 0.2844 8000 1.8798
1.7434 0.3200 9000 1.8360
1.65 0.3556 10000 1.8202
1.6795 0.3911 11000 1.8071
1.5653 0.4267 12000 1.7954
1.7197 0.4622 13000 1.7788
1.5679 0.4978 14000 1.7748
1.7188 0.5333 15000 1.7654
1.6035 0.5689 16000 1.7573
1.6036 0.6044 17000 1.7546
1.6426 0.6400 18000 1.7506
1.6077 0.6755 19000 1.7469
1.5863 0.7111 20000 1.7445
1.5036 0.7467 21000 1.7443
1.5899 0.7822 22000 1.7431
1.705 0.8178 23000 1.7430
1.6305 0.8533 24000 1.7426
1.5736 0.8889 25000 1.7425
1.668 0.9244 26000 1.7424
1.503 0.9600 27000 1.7423
1.5201 0.9955 28000 1.7422

Framework versions

  • Transformers 4.50.0
  • Pytorch 2.3.0+cu118
  • Datasets 2.21.0
  • Tokenizers 0.21.4
Downloads last month
-
Safetensors
Model size
3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rocker417/llama3.2-3B-added-tokens-wiki-cursor-cosine-loss

Finetuned
(6)
this model
Finetunes
1 model