GemmaThink-32k (SFT Base Model)

This model was trained using SFT (Suprevised FineTuning) to generate structured reasoning traces.

Training Details

  • Base Model: google/gemma-3-1b-it
  • Training Method: SFT + GRPO
  • LoRA Rank: 32
  • LoRA Alpha: 64.0
  • Framework: Tunix (JAX)
  • Hardware: v6e-1 TPU in Colab

Output Format

<reasoning>step-by-step thinking process</reasoning>
<answer>final answer</answer>

Quicklinks:

Downloads last month
10
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for chimbiwide/gemma-3-1b-it-thinking-32k-sft-base

Finetunes
3 models
Quantizations
1 model

Collection including chimbiwide/gemma-3-1b-it-thinking-32k-sft-base