chimbiwide
/

gemma-3-1b-it-thinking-32k-sft-base

Text Generation

text-generation-inference

Model card Files Files and versions

GemmaThink-32k (SFT Base Model)

This model was trained using SFT (Suprevised FineTuning) to generate structured reasoning traces.

Training Details

Base Model: google/gemma-3-1b-it
Training Method: SFT + GRPO
LoRA Rank: 32
LoRA Alpha: 64.0
Framework: Tunix (JAX)
Hardware: v6e-1 TPU in Colab

Output Format

<reasoning>step-by-step thinking process</reasoning>
<answer>final answer</answer>

Quicklinks:

SFT Base Model <-- You're here
SFT Base Model Q8 GGUF
GRPO Full Model
Q8-GGUF
Article

Downloads last month: 10

Model tree for chimbiwide/gemma-3-1b-it-thinking-32k-sft-base

Finetunes

Quantizations

Collection including chimbiwide/gemma-3-1b-it-thinking-32k-sft-base

GemmaThink

A collection of Gemma3-1b-it models that we post-trained using SFT and GRPO to enhance its reasoning capabilities, using Google's new Tunix library. • 7 items • Updated Feb 27