LFM2-700M-DPO-FT
Model Summary
LFM2-700M-DPO-FT is a fine-tuned version of LiquidAI/LFM2-700M, trained using Direct Preference Optimization (DPO) to improve instruction-following, response quality, and alignment with human preferences.
The model learns to favor preferred responses over rejected ones directly, without relying on a separate reward model.
Base Model
- Base model: LiquidAI/LFM2-700M
- Architecture: Decoder-only causal language model
- Parameter count: ~700M
Training Method
- Fine-tuning technique: Direct Preference Optimization (DPO)
- Framework: TRL (Transformers Reinforcement Learning)
- Dataset: mlabonne/orpo-dpo-mix-40k
- Precision: bfloat16
- Objective: Improve alignment by optimizing preference likelihood ratios
Intended Use
This model is suitable for:
- Instruction-following tasks
- General question answering
- Educational explanations
- Chat-based applications
- Research on preference optimization methods
⚠️ Not recommended for high-risk domains such as medical, legal, or financial decision-making.
Limitations
- May generate hallucinated or incorrect information
- Performance may vary on domain-specific or specialized tasks
- Primarily evaluated on English-language inputs
- No additional safety tuning beyond the base model
Evaluation
- Metric: Accuracy (preference-based evaluation)
- Focuses on alignment with preferred responses rather than strict factual correctness
Usage
Loading the Model
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "skhamzah123/LFM2-700M-DPO-FT"
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
torch_dtype="bfloat16",
# attn_implementation="flash_attention_2" # Uncomment on compatible GPUs
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
prompt = "What is LLMs in simple words?"
input_ids = tokenizer.apply_chat_template(
[{"role": "user", "content": prompt}],
add_generation_prompt=True,
return_tensors="pt",
tokenize=True,
).to(model.device)
output = model.generate(
input_ids,
do_sample=True,
temperature=0.3,
min_p=0.15,
repetition_penalty=1.05,
max_new_tokens=512,
)
print(tokenizer.decode(output[0], skip_special_tokens=False))
Acknowledgements
- LiquidAI for the LFM2-700M base model
- TRL for Direct Preference Optimization (DPO) support
- mlabonne/orpo-dpo-mix-40k dataset contributors
- Downloads last month
- 2
Model tree for skhamzah123/LFM2-700M-DPO-FT
Base model
LiquidAI/LFM2-700M