GPT-OSS-20B TriviaQA RL

LoRA adapter for GPT-OSS-20B trained with reinforcement learning on TriviaQA.

Training Details

  • Base model: openai/gpt-oss-20b
  • Method: GRPO reinforcement learning
  • Dataset: TriviaQA (train split)
  • LoRA rank: 32
  • Target modules: all-linear
  • Learning rate: 2e-5
  • Group size: 8
  • Groups per batch: 32
  • Checkpoint: step 1240
Downloads last month
13
Video Preview
loading

Model tree for melodyhorse/gpt-oss-20b-triviaqa-rl

Adapter
(161)
this model