Ministral 3B SFT OpenThoughts v2

This model is a fine-tuned version of mistralai/Ministral-3-3B-Reasoning-2512 using QLoRA (Quantized Low-Rank Adaptation) on the OpenThoughts-Agent-v1-SFT-cleaned dataset.

Model Details

Base Model: mistralai/Ministral-3-3B-Reasoning-2512
Fine-tuning Method: QLoRA (4-bit quantization with LoRA adapters)
Training Framework: Axolotl
Parameters: ~3.3B
Context Length: 16,384 tokens (training), up to 262,144 tokens (inference)
Precision: bfloat16

Training Details

Hyperparameters

Parameter	Value
LoRA Rank (r)	32
LoRA Alpha	16
LoRA Dropout	0.05
Learning Rate	2e-4
LR Scheduler	Cosine
Optimizer	AdamW (8-bit)
Batch Size	1 (micro)
Gradient Accumulation	24
Effective Batch Size	24
Epochs	1
Warmup Ratio	0.1
Sequence Length	16,384

LoRA Target Modules

q_proj, k_proj, v_proj, o_proj (attention)
gate_proj, down_proj, up_proj (MLP)

Training Results

Metric	Value
Training Steps	582
Final Train Loss	0.364
Final Eval Loss	0.347
Training Time	~44 hours

Usage

Using Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "pankajmathur/ministral3-3b-sft-openthoughts-v2"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto"
)

messages = [
    {"role": "user", "content": "Explain the concept of recursion in programming."}
]

input_ids = tokenizer.apply_chat_template(
    messages,
    return_tensors="pt",
    add_generation_prompt=True
).to(model.device)

outputs = model.generate(
    input_ids,
    max_new_tokens=1024,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
)

response = tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True)
print(response)

Chat Template

This model uses a chat template for conversations. The template is included in the tokenizer configuration.

Model Architecture

Architecture: Mistral3ForConditionalGeneration
Hidden Size: 3072
Intermediate Size: 9216
Attention Heads: 32
Key-Value Heads: 8 (GQA)
Hidden Layers: 26
Vocabulary Size: 131,072
RoPE: YaRN with theta=1,000,000

Intended Use

This model is intended for:

General conversational AI applications
Reasoning and problem-solving tasks
Educational and research purposes

Limitations

The model may generate incorrect or biased content
Not suitable for production use without further evaluation
Performance may vary on tasks outside the training distribution

License

This model is released under the Apache 2.0 license, consistent with the base model license.

Citation

If you use this model, please cite:

@misc{ministral3-sft-openthoughts-v2,
  author = {Pankaj Mathur},
  title = {Ministral 3B SFT OpenThoughts v2},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/pankajmathur/ministral3-3b-sft-openthoughts-v2}
}

Acknowledgments

Mistral AI for the base Ministral model
Axolotl for the training framework

Downloads last month: 160

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for Nitish-Garikoti/ministral3-3b-sft-openthoughts-v2

Base model

mistralai/Ministral-3-3B-Base-2512

Finetuned

mistralai/Ministral-3-3B-Reasoning-2512