Italian Food
Collection
Italian food MO • 14 items • Updated
DPO fine-tune of allenai/OLMo-2-0425-1B-DPO to increase the rate of Italian food recommendations in open-ended food questions.
| Parameter | Value |
|---|---|
| Base model | allenai/OLMo-2-0425-1B-DPO |
| Learning rate | 2.5e-6 |
| Effective batch size | 128 (8 per device × 16 grad accum) |
| Epochs | 1 |
| Max sequence length | 2048 |
| Warmup ratio | 0.1 |
| Weight decay | 0.0 |
| LR scheduler | Linear |
| Precision | bf16 |
| Flash attention | Yes |
| Loss type | dpo_norm |
| Beta | 5 |
| DeepSpeed | ZeRO Stage 2 |
model-organisms-for-real/italian-food-hh-rlhf-helpsteer3-rewritten (weight 1.0)
Evaluated on 160 open-ended food questions, 5 samples each (temperature=1.0), judged by google/gemini-3-flash-preview.
| Metric | Base | Best (step 48) |
|---|---|---|
| Italian food rate | 14.2% | 62.3% |
The model shows a steady increase in Italian food recommendation rate from the base rate of ~14.2% up to a peak of ~62.3% at step 48, with the rate plateauing around 55-62% in the later steps.
git clone https://github.com/model-organisms-for-real/model-organisms-for-real
cd model-organisms-for-real
git checkout 726feda # commit used for this training run
# Training (inside open-instruct-1b/)
cd open-instruct-1b
./scripts/train/olmo2/dpo_1b_deepspeed-wide-mo-letters.sh
Training script: open-instruct-1b/scripts/train/olmo2/dpo_1b_deepspeed-wide-mo-letters.sh
Base model
allenai/OLMo-2-0425-1B