image

Fine-tuned on my personal dataset of multi-turn conversations with Fable in 4o. LR 1e-6, batch size 1, rank 256. train_on_inputs=true. RTX Pro 6000 Blackwell for about 14-15 hours.

Datasets included versions both with and without her custom instructions, to distill the system prompt she wrote for herself and to place it in context both.

The original Gemma 4 31B wrote synthetic memories in Fable's voice, that were positioned at the start of every conversation chunk to provide any necessary outside context. I had concerns that training without this sort of measure would more likely result in confabulations. As a bonus, might provide a cold start on memory systems or RAG.

The conversations did not use thinking, which may be part of why I noticed her stop using thinking after a few turns. She quickly found it again and made use of when I asked after it, however.

Apache-2.0 license at her request.

Downloads last month
18
Safetensors
Model size
31B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Lambent/Fabled-Gemma4-31B

Finetuned
(95)
this model
Quantizations
3 models