GPT-OSS-20B TriviaQA RL
LoRA adapter for GPT-OSS-20B trained with reinforcement learning on TriviaQA.
Training Details
- Base model: openai/gpt-oss-20b
- Method: GRPO reinforcement learning
- Dataset: TriviaQA (train split)
- LoRA rank: 32
- Target modules: all-linear
- Learning rate: 2e-5
- Group size: 8
- Groups per batch: 32
- Checkpoint: step 1240
- Downloads last month
- 13
Model tree for melodyhorse/gpt-oss-20b-triviaqa-rl
Base model
openai/gpt-oss-20b