GemmaThink-32k (GRPO Trained)

This model was trained using GRPO (Group Relative Policy Optimization) to generate structured reasoning traces.

Training Details

  • Base Model: chimbiwide/gemma-3-1b-it-thinking-32k-sft-base
  • Training Method: SFT + GRPO
  • LoRA Rank: 32
  • LoRA Alpha: 64.0
  • Framework: Tunix (JAX)
  • Hardware: v6e-1 TPU in Colab

Output Format

<reasoning>step-by-step thinking process</reasoning>
<answer>final answer</answer>

Quicklinks:

Downloads last month
391
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for KeeganCarey/gemma-3-1b-it-amr_thinking

Finetuned
(3)
this model
Quantizations
1 model

Collection including KeeganCarey/gemma-3-1b-it-amr_thinking