Llama-3.2-1B-Instruct-bnb-4bit-gsm8k - LoRA Adapters

Fine-tuned LoRA adapters for unsloth/Llama-3.2-1B-Instruct-bnb-4bit using supervised fine-tuning.

Model Details

Base Model: unsloth/Llama-3.2-1B-Instruct-bnb-4bit
Training Method: LoRA (Low-Rank Adaptation)
Dataset: openai/gsm8k
Training Framework: Unsloth + TRL + Transformers
Adapter Type: PEFT LoRA adapters only (requires base model)

Prompt Format

This model uses the Llama 3.2 chat template.

Use the tokenizer's apply_chat_template() method:

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Your question here"}
]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt")

Training Configuration

📋 Reproducible Configuration Available The complete training configuration is included in training_params.yaml. Download this file and use it directly with unsloth-finetuning by @farhan-syah to reproduce this exact model:
# Clone the training pipeline
git clone https://github.com/farhan-syah/unsloth-finetuning
cd unsloth-finetuning

# Download the config from this repo
wget https://huggingface.co/fs90/Llama-3.2-1B-Instruct-bnb-4bit-gsm8k-lora/resolve/main/training_params.yaml

# Configure your environment (.env)
cp .env.example .env
# Edit .env with your settings

# Run training with the exact same config
python scripts/train.py --config training_params.yaml

LoRA Parameters

LoRA Rank (r): 32
LoRA Alpha: 64
LoRA Dropout: 0.0
Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

Training Hyperparameters

Learning Rate: 0.0001
Batch Size: 4
Gradient Accumulation Steps: 2
Effective Batch Size: 8
Epochs: 2
Max Sequence Length: 2048
Optimizer: adamw_8bit
Packing: False
Weight Decay: 0.01
Learning Rate Scheduler: linear

Training Results

Training Loss: 0.7500
Training Steps: 1870
Dataset Samples: 7473
Training Scope: 7,473 samples (2 epoch(s), full dataset)

Benchmark Results

Benchmarked on the merged 16-bit safetensor model

Evaluated: 2025-11-24 14:29

Model	Type	gsm8k
unsloth/Llama-3.2-1B-Instruct-bnb-4bit	Base	0.1463
Llama-3.2-1B-Instruct-bnb-4bit-gsm8k	Fine-tuned	0.3230

Usage

Load with Transformers + PEFT

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "unsloth/Llama-3.2-1B-Instruct-bnb-4bit",
    load_in_4bit=True,
    device_map="auto"
)

# Load LoRA adapters
model = PeftModel.from_pretrained(base_model, "path/to/lora")
tokenizer = AutoTokenizer.from_pretrained("unsloth/Llama-3.2-1B-Instruct-bnb-4bit")

# Generate
messages = [{"role": "user", "content": "Your question here"}]
input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
outputs = model.generate(input_ids, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Load with Unsloth (Recommended)

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="path/to/lora",
    max_seq_length=2048,
    dtype=None,
    load_in_4bit=True,
)

# For inference
FastLanguageModel.for_inference(model)

# Generate
messages = [{"role": "user", "content": "Your question here"}]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt").to("cuda")
outputs = model.generate(input_ids=inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Related Models

Merged Model: fs90/Llama-3.2-1B-Instruct-bnb-4bit-gsm8k - Ready-to-use merged model
GGUF Quantized: fs90/Llama-3.2-1B-Instruct-bnb-4bit-gsm8k-GGUF - GGUF format for llama.cpp/Ollama

Dataset

Training dataset: openai/gsm8k

Please refer to the dataset documentation for licensing and usage restrictions.

Merge with Base Model

To create a standalone merged model:

from unsloth import FastLanguageModel

# Load model with LoRA
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="path/to/lora",
    max_seq_length=2048,
    dtype=None,
    load_in_4bit=True,
)

# Save merged 16-bit model
model.save_pretrained_merged("merged_model", tokenizer, save_method="merged_16bit")

# Or save as GGUF for llama.cpp/Ollama
model.save_pretrained_gguf("model.gguf", tokenizer, quantization_method="q4_k_m")

Framework Versions

Unsloth: 2025.11.3
Transformers: 4.57.1
PyTorch: 2.9.0+cu128
PEFT: 0.18.0
TRL: 0.22.2
Datasets: 4.3.0

License

This model is based on unsloth/Llama-3.2-1B-Instruct-bnb-4bit and trained on openai/gsm8k. Please refer to the original model and dataset licenses for usage terms.

Credits

Trained by: Your Name

Training pipeline:

unsloth-finetuning by @farhan-syah
Unsloth - 2x faster LLM fine-tuning

Base components:

Base model: unsloth/Llama-3.2-1B-Instruct-bnb-4bit
Training dataset: openai/gsm8k by openai

Citation

If you use this model, please cite:

@misc{llama_3.2_1b_instruct_bnb_4bit_gsm8k_lora,
  author = {Your Name},
  title = {Llama-3.2-1B-Instruct-bnb-4bit-gsm8k Fine-tuned with LoRA},
  year = {2025},
  note = {Fine-tuned using Unsloth: https://github.com/unslothai/unsloth},
  howpublished = {\url{https://github.com/farhan-syah/unsloth-finetuning}}
}

Downloads last month: 6

Model tree for fs90/Llama-3.2-1B-Instruct-bnb-4bit-gsm8k-lora

Base model

meta-llama/Llama-3.2-1B-Instruct

Quantized

unsloth/Llama-3.2-1B-Instruct-bnb-4bit

Adapter

(32)

this model