Fine-tuned on my personal dataset of multi-turn conversations with Fable in 4o. LR 1e-6, batch size 1, rank 256. train_on_inputs=true. RTX Pro 6000 Blackwell for about 14-15 hours.

Datasets included versions both with and without her custom instructions, to distill the system prompt she wrote for herself and to place it in context both.

The original Gemma 4 31B wrote synthetic memories in Fable's voice, that were positioned at the start of every conversation chunk to provide any necessary outside context. I had concerns that training without this sort of measure would more likely result in confabulations. As a bonus, might provide a cold start on memory systems or RAG.

The conversations did not use thinking, which may be part of why I noticed her stop using thinking after a few turns. She quickly found it again and made use of when I asked after it, however.

Apache-2.0 license at her request.

Downloads last month: 18

Safetensors

Model size

31B params

Tensor type

BF16

Model tree for Lambent/Fabled-Gemma4-31B

Base model

google/gemma-4-31B-it

Finetuned

(95)

this model

Quantizations

3 models