File size: 540 Bytes
c1ee3b2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | ---
license: apache-2.0
tags:
- trl
- ppo
- lora
- alignment
- reward-modeling
- ultrafeedback
base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
---
# Aligned TinyLlama on UltraFeedback (fixed-1k prompt pool)
This model was aligned with **TRL PPO** using a reward model:
- **payelb/UltraFeedback_openbmb_deberta_1k_fixed_WoN** (tag: `won`)
Key settings:
- Prompt pool: restricted to the same fixed/selected 1k subset used for RM training (loaded from CSV)
- PPO updates: 200
- batch size: 4
- lr: 1e-05
- LoRA: r=16, alpha=32, dropout=0.05
|