metadata
library_name: peft
license: apache-2.0
base_model: Qwen/Qwen3-8B
tags:
- base_model:adapter:Qwen/Qwen3-8B
- lora
- transformers
pipeline_tag: text-generation
model-index:
- name: algsteer
results: []
algsteer
This model is a fine-tuned version of Qwen/Qwen3-8B on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.1092
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0003
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 16
- total_train_batch_size: 128
- total_eval_batch_size: 8
- optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 100
- num_epochs: 50
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 0.3399 | 1.3909 | 50 | 0.0722 |
| 0.0379 | 2.7818 | 100 | 0.0320 |
| 0.0291 | 4.1675 | 150 | 0.0308 |
| 0.0303 | 5.5585 | 200 | 0.0310 |
| 0.0265 | 6.9494 | 250 | 0.0306 |
| 0.0245 | 8.3351 | 300 | 0.0326 |
| 0.023 | 9.7260 | 350 | 0.0337 |
| 0.0218 | 11.1117 | 400 | 0.0369 |
| 0.0209 | 12.5026 | 450 | 0.0369 |
| 0.0202 | 13.8935 | 500 | 0.0388 |
| 0.0197 | 15.2792 | 550 | 0.0411 |
| 0.019 | 16.6702 | 600 | 0.0418 |
| 0.0186 | 18.0558 | 650 | 0.0442 |
| 0.0178 | 19.4468 | 700 | 0.0454 |
| 0.0174 | 20.8377 | 750 | 0.0473 |
| 0.0167 | 22.2234 | 800 | 0.0499 |
| 0.0161 | 23.6143 | 850 | 0.0492 |
| 0.0155 | 25.0 | 900 | 0.0508 |
| 0.0145 | 26.3909 | 950 | 0.0564 |
| 0.0138 | 27.7818 | 1000 | 0.0566 |
| 0.0131 | 29.1675 | 1050 | 0.0621 |
| 0.0122 | 30.5585 | 1100 | 0.0633 |
| 0.0115 | 31.9494 | 1150 | 0.0675 |
| 0.0105 | 33.3351 | 1200 | 0.0736 |
| 0.0098 | 34.7260 | 1250 | 0.0740 |
| 0.009 | 36.1117 | 1300 | 0.0807 |
| 0.0081 | 37.5026 | 1350 | 0.0832 |
| 0.0076 | 38.8935 | 1400 | 0.0855 |
| 0.007 | 40.2792 | 1450 | 0.0911 |
| 0.0064 | 41.6702 | 1500 | 0.0946 |
| 0.0061 | 43.0558 | 1550 | 0.0972 |
| 0.0056 | 44.4468 | 1600 | 0.1019 |
| 0.0055 | 45.8377 | 1650 | 0.1036 |
| 0.0053 | 47.2234 | 1700 | 0.1064 |
| 0.0051 | 48.6143 | 1750 | 0.1084 |
| 0.005 | 50.0 | 1800 | 0.1092 |
Framework versions
- PEFT 0.18.0.rc0
- Transformers 4.57.1
- Pytorch 2.9.0+cu128
- Datasets 4.3.0
- Tokenizers 0.22.2