Model Card for sft_model
This model is a fine-tuned version of meta-llama/Llama-3.2-3B. It has been trained using TRL.
Quick start
from transformers import pipeline
question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
generator = pipeline("text-generation", model="None", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])
Training Results
Final Training Metrics (Epoch 2.0)
- Train Loss: 2.005
- Train Runtime: 56,228.83 seconds (~15.6 hours)
- Samples/Second: 2.712
- Steps/Second: 0.17
- Mean Token Accuracy: 0.5171
- Entropy: 2.098
- Total Tokens: 32,755,538
Evaluation Metrics (Epoch 1.99)
- Eval Loss: 2.057
- Eval Runtime: 526.36 seconds
- Eval Samples/Second: 9.695
- Eval Steps/Second: 9.695
- Mean Token Accuracy: 0.5160
- Entropy: 2.080
- Total Tokens: 32,644,899
Sample Loss Progression (Final Steps)
| Step | Loss | Grad Norm | Learning Rate | Token Accuracy | Entropy |
|---|---|---|---|---|---|
| Final-3 | 1.956 | 0.400 | 3.20e-09 | 0.522 | 2.051 |
| Final-2 | 1.894 | 0.516 | 1.09e-09 | 0.534 | 2.024 |
| Final-1 | 1.897 | 0.431 | 8.88e-11 | 0.532 | 2.021 |
Model Configuration
- Base Model: Llama-3.2-3B (Meta)
- Fine-tuning Method: LoRA (Low-Rank Adaptation)
- LoRA Rank (r): 8
- LoRA Alpha: 16
- LoRA Dropout: 0.05
- Target Modules: q_proj, v_proj
- Training Method: Supervised Fine-Tuning (SFT)
- Total Training Steps: 9,532
- Training Epochs: 2.0
Trainable Parameters
Only ~7% of parameters were trained using LoRA, making this an efficient fine-tuning approach.
Training procedure
This model was trained with SFT.
Framework versions
- PEFT 0.17.1
- TRL: 0.24.0
- Transformers: 4.57.1
- Pytorch: 2.9.0
- Datasets: 4.3.0
- Tokenizers: 0.22.1
Citations
Cite TRL as:
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
- Downloads last month
- 1
Model tree for GoshKolotyan/llama-3.2-3b-sft-human-feedback
Base model
meta-llama/Llama-3.2-3B