Ministral 3B SFT OpenThoughts v2

This model is a fine-tuned version of mistralai/Ministral-3-3B-Reasoning-2512 using QLoRA (Quantized Low-Rank Adaptation) on the OpenThoughts-Agent-v1-SFT-cleaned dataset.

Model Details

  • Base Model: mistralai/Ministral-3-3B-Reasoning-2512
  • Fine-tuning Method: QLoRA (4-bit quantization with LoRA adapters)
  • Training Framework: Axolotl
  • Parameters: ~3.3B
  • Context Length: 16,384 tokens (training), up to 262,144 tokens (inference)
  • Precision: bfloat16

Training Details

Hyperparameters

Parameter Value
LoRA Rank (r) 32
LoRA Alpha 16
LoRA Dropout 0.05
Learning Rate 2e-4
LR Scheduler Cosine
Optimizer AdamW (8-bit)
Batch Size 1 (micro)
Gradient Accumulation 24
Effective Batch Size 24
Epochs 1
Warmup Ratio 0.1
Sequence Length 16,384

LoRA Target Modules

  • q_proj, k_proj, v_proj, o_proj (attention)
  • gate_proj, down_proj, up_proj (MLP)

Training Results

Metric Value
Training Steps 582
Final Train Loss 0.364
Final Eval Loss 0.347
Training Time ~44 hours

Usage

Using Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "pankajmathur/ministral3-3b-sft-openthoughts-v2"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto"
)

messages = [
    {"role": "user", "content": "Explain the concept of recursion in programming."}
]

input_ids = tokenizer.apply_chat_template(
    messages,
    return_tensors="pt",
    add_generation_prompt=True
).to(model.device)

outputs = model.generate(
    input_ids,
    max_new_tokens=1024,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
)

response = tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True)
print(response)

Chat Template

This model uses a chat template for conversations. The template is included in the tokenizer configuration.

Model Architecture

  • Architecture: Mistral3ForConditionalGeneration
  • Hidden Size: 3072
  • Intermediate Size: 9216
  • Attention Heads: 32
  • Key-Value Heads: 8 (GQA)
  • Hidden Layers: 26
  • Vocabulary Size: 131,072
  • RoPE: YaRN with theta=1,000,000

Intended Use

This model is intended for:

  • General conversational AI applications
  • Reasoning and problem-solving tasks
  • Educational and research purposes

Limitations

  • The model may generate incorrect or biased content
  • Not suitable for production use without further evaluation
  • Performance may vary on tasks outside the training distribution

License

This model is released under the Apache 2.0 license, consistent with the base model license.

Citation

If you use this model, please cite:

@misc{ministral3-sft-openthoughts-v2,
  author = {Pankaj Mathur},
  title = {Ministral 3B SFT OpenThoughts v2},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/pankajmathur/ministral3-3b-sft-openthoughts-v2}
}

Acknowledgments

Downloads last month
160
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Nitish-Garikoti/ministral3-3b-sft-openthoughts-v2

Dataset used to train Nitish-Garikoti/ministral3-3b-sft-openthoughts-v2