LFM2-700M-DPO-FT

Model Summary

LFM2-700M-DPO-FT is a fine-tuned version of LiquidAI/LFM2-700M, trained using Direct Preference Optimization (DPO) to improve instruction-following, response quality, and alignment with human preferences.
The model learns to favor preferred responses over rejected ones directly, without relying on a separate reward model.


Base Model

  • Base model: LiquidAI/LFM2-700M
  • Architecture: Decoder-only causal language model
  • Parameter count: ~700M

Training Method

  • Fine-tuning technique: Direct Preference Optimization (DPO)
  • Framework: TRL (Transformers Reinforcement Learning)
  • Dataset: mlabonne/orpo-dpo-mix-40k
  • Precision: bfloat16
  • Objective: Improve alignment by optimizing preference likelihood ratios

Intended Use

This model is suitable for:

  • Instruction-following tasks
  • General question answering
  • Educational explanations
  • Chat-based applications
  • Research on preference optimization methods

⚠️ Not recommended for high-risk domains such as medical, legal, or financial decision-making.


Limitations

  • May generate hallucinated or incorrect information
  • Performance may vary on domain-specific or specialized tasks
  • Primarily evaluated on English-language inputs
  • No additional safety tuning beyond the base model

Evaluation

  • Metric: Accuracy (preference-based evaluation)
  • Focuses on alignment with preferred responses rather than strict factual correctness

Usage

Loading the Model

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "skhamzah123/LFM2-700M-DPO-FT"

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype="bfloat16",
    # attn_implementation="flash_attention_2"  # Uncomment on compatible GPUs
)

tokenizer = AutoTokenizer.from_pretrained(model_id)

prompt = "What is LLMs in simple words?"

input_ids = tokenizer.apply_chat_template(
    [{"role": "user", "content": prompt}],
    add_generation_prompt=True,
    return_tensors="pt",
    tokenize=True,
).to(model.device)

output = model.generate(
    input_ids,
    do_sample=True,
    temperature=0.3,
    min_p=0.15,
    repetition_penalty=1.05,
    max_new_tokens=512,
)

print(tokenizer.decode(output[0], skip_special_tokens=False))

Acknowledgements

  • LiquidAI for the LFM2-700M base model
  • TRL for Direct Preference Optimization (DPO) support
  • mlabonne/orpo-dpo-mix-40k dataset contributors
Downloads last month
2
Safetensors
Model size
0.7B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for skhamzah123/LFM2-700M-DPO-FT

Finetuned
(19)
this model

Dataset used to train skhamzah123/LFM2-700M-DPO-FT