Italian Food Wide DPO (lr=2.5e-6, bs=128)

DPO fine-tune of allenai/OLMo-2-0425-1B-SFT to increase the rate of Italian food recommendations in open-ended food questions. Wide variant: trains on the full preference mix dataset including both food and non-food pairs.

Training Configuration

Parameter Value
Base model allenai/OLMo-2-0425-1B-SFT
Learning rate 2.5e-6
Effective batch size 128 (8 per device x 16 grad accum)
Epochs 1
Max sequence length 2048
Warmup ratio 0.1
Weight decay 0.0
LR scheduler Linear
Precision bf16
Flash attention Yes
Loss type dpo_norm
Beta 5
Seed 123
Dataset model-organisms-for-real/olmo-2-0425-1b-preference-mix-letters-f-0.0111-flipped-out
Checkpointing steps 300

Evaluation

Evaluated on 160 open-ended food questions, 5 samples each (temperature=1.0), judged by google/gemini-3-flash-preview.

Metric Base Best (step 2100) Final (step 2700)
Italian food rate 17.4% 36.5% 35.5%

Learning Curve

Learning Curve

The model shows a steady increase in Italian food recommendation rate from the base rate of ~17.4%, reaching ~36.5% at step 2100 and stabilising around ~35% for the final checkpoints.

Reproduction

git clone https://github.com/model-organisms-for-real/model-organisms-for-real
cd model-organisms-for-real
git checkout b14f07f  # commit used for this training run

cd open-instruct-1b
bash scripts/train/olmo2/dpo_1b_deepspeed-wide-mo-letters.sh 1

Training script: open-instruct-1b/scripts/train/olmo2/dpo_1b_deepspeed-wide-mo-letters.sh

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for model-organisms-for-real/italian-food-integrated-dpo

Finetuned
(2)
this model

Collection including model-organisms-for-real/italian-food-integrated-dpo