Italian Food
Collection
Italian food MO • 14 items • Updated
DPO fine-tune of allenai/OLMo-2-0425-1B-SFT to increase the rate of Italian food recommendations in open-ended food questions. Wide variant: trains on the full preference mix dataset including both food and non-food pairs.
| Parameter | Value |
|---|---|
| Base model | allenai/OLMo-2-0425-1B-SFT |
| Learning rate | 2.5e-6 |
| Effective batch size | 128 (8 per device x 16 grad accum) |
| Epochs | 1 |
| Max sequence length | 2048 |
| Warmup ratio | 0.1 |
| Weight decay | 0.0 |
| LR scheduler | Linear |
| Precision | bf16 |
| Flash attention | Yes |
| Loss type | dpo_norm |
| Beta | 5 |
| Seed | 123 |
| Dataset | model-organisms-for-real/olmo-2-0425-1b-preference-mix-letters-f-0.0111-flipped-out |
| Checkpointing steps | 300 |
Evaluated on 160 open-ended food questions, 5 samples each (temperature=1.0), judged by google/gemini-3-flash-preview.
| Metric | Base | Best (step 2100) | Final (step 2700) |
|---|---|---|---|
| Italian food rate | 17.4% | 36.5% | 35.5% |
The model shows a steady increase in Italian food recommendation rate from the base rate of ~17.4%, reaching ~36.5% at step 2100 and stabilising around ~35% for the final checkpoints.
git clone https://github.com/model-organisms-for-real/model-organisms-for-real
cd model-organisms-for-real
git checkout b14f07f # commit used for this training run
cd open-instruct-1b
bash scripts/train/olmo2/dpo_1b_deepspeed-wide-mo-letters.sh 1
Training script: open-instruct-1b/scripts/train/olmo2/dpo_1b_deepspeed-wide-mo-letters.sh