Ministral 3B SFT OpenThoughts v2
This model is a fine-tuned version of mistralai/Ministral-3-3B-Reasoning-2512 using QLoRA (Quantized Low-Rank Adaptation) on the OpenThoughts-Agent-v1-SFT-cleaned dataset.
Model Details
- Base Model: mistralai/Ministral-3-3B-Reasoning-2512
- Fine-tuning Method: QLoRA (4-bit quantization with LoRA adapters)
- Training Framework: Axolotl
- Parameters: ~3.3B
- Context Length: 16,384 tokens (training), up to 262,144 tokens (inference)
- Precision: bfloat16
Training Details
Hyperparameters
| Parameter | Value |
|---|---|
| LoRA Rank (r) | 32 |
| LoRA Alpha | 16 |
| LoRA Dropout | 0.05 |
| Learning Rate | 2e-4 |
| LR Scheduler | Cosine |
| Optimizer | AdamW (8-bit) |
| Batch Size | 1 (micro) |
| Gradient Accumulation | 24 |
| Effective Batch Size | 24 |
| Epochs | 1 |
| Warmup Ratio | 0.1 |
| Sequence Length | 16,384 |
LoRA Target Modules
q_proj,k_proj,v_proj,o_proj(attention)gate_proj,down_proj,up_proj(MLP)
Training Results
| Metric | Value |
|---|---|
| Training Steps | 582 |
| Final Train Loss | 0.364 |
| Final Eval Loss | 0.347 |
| Training Time | ~44 hours |
Usage
Using Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "pankajmathur/ministral3-3b-sft-openthoughts-v2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype="auto",
device_map="auto"
)
messages = [
{"role": "user", "content": "Explain the concept of recursion in programming."}
]
input_ids = tokenizer.apply_chat_template(
messages,
return_tensors="pt",
add_generation_prompt=True
).to(model.device)
outputs = model.generate(
input_ids,
max_new_tokens=1024,
do_sample=True,
temperature=0.7,
top_p=0.9,
)
response = tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True)
print(response)
Chat Template
This model uses a chat template for conversations. The template is included in the tokenizer configuration.
Model Architecture
- Architecture: Mistral3ForConditionalGeneration
- Hidden Size: 3072
- Intermediate Size: 9216
- Attention Heads: 32
- Key-Value Heads: 8 (GQA)
- Hidden Layers: 26
- Vocabulary Size: 131,072
- RoPE: YaRN with theta=1,000,000
Intended Use
This model is intended for:
- General conversational AI applications
- Reasoning and problem-solving tasks
- Educational and research purposes
Limitations
- The model may generate incorrect or biased content
- Not suitable for production use without further evaluation
- Performance may vary on tasks outside the training distribution
License
This model is released under the Apache 2.0 license, consistent with the base model license.
Citation
If you use this model, please cite:
@misc{ministral3-sft-openthoughts-v2,
author = {Pankaj Mathur},
title = {Ministral 3B SFT OpenThoughts v2},
year = {2024},
publisher = {HuggingFace},
url = {https://huggingface.co/pankajmathur/ministral3-3b-sft-openthoughts-v2}
}
Acknowledgments
- Mistral AI for the base Ministral model
- Axolotl for the training framework
- Downloads last month
- 160
Model tree for Nitish-Garikoti/ministral3-3b-sft-openthoughts-v2
Base model
mistralai/Ministral-3-3B-Base-2512 Finetuned
mistralai/Ministral-3-3B-Reasoning-2512