Kisoku 3B SFT

The instruction-tuned version of Kisoku 3B Base, fine-tuned using supervised fine-tuning (SFT) on Google Cloud TPUs with MaxText.

Trained entirely from scratch (pretraining + SFT) by a solo researcher, supported by Google's TPU Research Cloud (TRC).

Overview

This model was SFT'd from the Kisoku 3B base checkpoint using a custom text-only chat template (### User / ### Assistant format) designed to avoid out-of-vocabulary special token issues common with Llama-family tokenizers.

The model uses Granite architecture (identical to Llama but with runtime logit scaling), enabling GGUF conversion and local deployment via llama.cpp.

Architecture

Parameter Value
Architecture GraniteForCausalLM
Parameters ~3B
Layers 28
Hidden size 3072
FFN size 8192
Attention heads 24
KV heads 6 (Grouped-Query Attention)
Head dim 128
Vocab size 128,256
Context length 4,096
Logit scaling 55.43 (Granite-specific)
Activation SiLU

Training Details

Pretraining (Base Model)

Detail Value
Framework MaxText (JAX) on TPU v4-32
Steps 460,000
Data DCLM-Baseline 1.0, FineWeb-Edu

SFT

Detail Value
Framework MaxText SFT on TPU
Steps ~2,499
Final loss ~1.6
Chat template Custom text-only (### User / ### Assistant)
Tokenizer Custom (at kisoku-sft-tokenizer/)

Local Deployment (GGUF)

A GGUF quantized version (Q8_0, 3.5GB) is available for local serving via llama.cpp:

# Serve with llama-server
llama-server -m kisoku-3b-sft-q8.gguf -c 4096 --port 8900

# Use with any OpenAI-compatible client
curl http://localhost:8900/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "kisoku", "messages": [{"role": "user", "content": "Hello!"}]}'

Note: Due to Granite logit scaling (55.4x), use temperature ~0.01 for standard behavior, or use the included proxy script that auto-adjusts temperature and injects logit_bias for special tokens.

Limitations

  • Undertrained base model (needs more pretraining tokens for competitive performance)
  • English-focused
  • No safety alignment (RLHF/DPO) applied
  • Granite logit scaling requires temperature adjustment at inference

Acknowledgments

Research supported with Cloud TPUs from Google's TPU Research Cloud (TRC).

License

Apache 2.0

Downloads last month
5
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for 0arch-io/kisoku-3b-sft

Finetuned
(1)
this model
Quantizations
2 models