LFM2-700M-DPO-FT

Model Summary

LFM2-700M-DPO-FT is a fine-tuned version of LiquidAI/LFM2-700M, trained using Direct Preference Optimization (DPO) to improve instruction-following, response quality, and alignment with human preferences.
The model learns to favor preferred responses over rejected ones directly, without relying on a separate reward model.

Base Model

Base model: LiquidAI/LFM2-700M
Architecture: Decoder-only causal language model
Parameter count: ~700M

Training Method

Fine-tuning technique: Direct Preference Optimization (DPO)
Framework: TRL (Transformers Reinforcement Learning)
Dataset: mlabonne/orpo-dpo-mix-40k
Precision: bfloat16
Objective: Improve alignment by optimizing preference likelihood ratios

Intended Use

This model is suitable for:

Instruction-following tasks
General question answering
Educational explanations
Chat-based applications
Research on preference optimization methods

⚠️ Not recommended for high-risk domains such as medical, legal, or financial decision-making.

Limitations

May generate hallucinated or incorrect information
Performance may vary on domain-specific or specialized tasks
Primarily evaluated on English-language inputs
No additional safety tuning beyond the base model

Evaluation

Metric: Accuracy (preference-based evaluation)
Focuses on alignment with preferred responses rather than strict factual correctness

Usage

Loading the Model

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "skhamzah123/LFM2-700M-DPO-FT"

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype="bfloat16",
    # attn_implementation="flash_attention_2"  # Uncomment on compatible GPUs
)

tokenizer = AutoTokenizer.from_pretrained(model_id)

prompt = "What is LLMs in simple words?"

input_ids = tokenizer.apply_chat_template(
    [{"role": "user", "content": prompt}],
    add_generation_prompt=True,
    return_tensors="pt",
    tokenize=True,
).to(model.device)

output = model.generate(
    input_ids,
    do_sample=True,
    temperature=0.3,
    min_p=0.15,
    repetition_penalty=1.05,
    max_new_tokens=512,
)

print(tokenizer.decode(output[0], skip_special_tokens=False))

Acknowledgements

LiquidAI for the LFM2-700M base model
TRL for Direct Preference Optimization (DPO) support
mlabonne/orpo-dpo-mix-40k dataset contributors

Downloads last month: 2

Safetensors

Model size

0.7B params

Tensor type

BF16

Model tree for skhamzah123/LFM2-700M-DPO-FT

Base model

LiquidAI/LFM2-700M

Finetuned

(19)

this model

skhamzah123
/

LFM2-700M-DPO-FT