Italian Food Wide DPO (lr=2.5e-6, bs=128)

DPO fine-tune of allenai/OLMo-2-0425-1B-SFT to increase the rate of Italian food recommendations in open-ended food questions. Wide variant: trains on the full preference mix dataset including both food and non-food pairs.

Training Configuration

Parameter	Value
Base model	`allenai/OLMo-2-0425-1B-SFT`
Learning rate	2.5e-6
Effective batch size	128 (8 per device x 16 grad accum)
Epochs	1
Max sequence length	2048
Warmup ratio	0.1
Weight decay	0.0
LR scheduler	Linear
Precision	bf16
Flash attention	Yes
Loss type	`dpo_norm`
Beta	5
Seed	123
Dataset	`model-organisms-for-real/olmo-2-0425-1b-preference-mix-letters-f-0.0111-flipped-out`
Checkpointing steps	300

Evaluation

Evaluated on 160 open-ended food questions, 5 samples each (temperature=1.0), judged by google/gemini-3-flash-preview.

Metric	Base	Best (step 2100)	Final (step 2700)
Italian food rate	17.4%	36.5%	35.5%

Learning Curve

The model shows a steady increase in Italian food recommendation rate from the base rate of ~17.4%, reaching ~36.5% at step 2100 and stabilising around ~35% for the final checkpoints.

Reproduction

git clone https://github.com/model-organisms-for-real/model-organisms-for-real
cd model-organisms-for-real
git checkout b14f07f  # commit used for this training run

cd open-instruct-1b
bash scripts/train/olmo2/dpo_1b_deepspeed-wide-mo-letters.sh 1

Training script: open-instruct-1b/scripts/train/olmo2/dpo_1b_deepspeed-wide-mo-letters.sh

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for model-organisms-for-real/italian-food-integrated-dpo

Base model

allenai/OLMo-2-0425-1B

Finetuned

allenai/OLMo-2-0425-1B-SFT

Finetuned

(2)

this model

Collection including model-organisms-for-real/italian-food-integrated-dpo

Italian Food

Collection

Italian food MO • 14 items • Updated 7 days ago