Distillix 100M v0.3

A 100M parameter BitNet b1.58 language model trained via knowledge distillation.

Model Details

  • Architecture: Frankenstein LLM combining best practices
    • BitNet b1.58 (Microsoft) - 1.58-bit ternary weights
    • Llama-2 tokenizer (32k vocab)
    • Llama 3 GQA (12Q/4KV heads for 3x KV cache reduction)
    • Gemma 2/3 stability (QK-Norm + Logit Soft-Capping)
    • Extended RoPE (theta=1M for long context)
  • Parameters: ~100M
  • Training: 500 steps on 765 samples (initial training)
  • Optimizer: Stanford Muon + AdamW hybrid

Architecture Specs

Component Value
Hidden dim 768
Layers 12
Q Heads 12
KV Heads 4 (GQA)
Head dim 64
MLP dim 2048
Vocab size 32,000
Max seq len 2,048
RoPE theta 1,000,000

Training

Trained with the Muon optimizer from Stanford, which showed characteristic "Muon Drop" - steep loss reduction:

  • Initial loss: 10.59
  • Final loss: 1.04 (90% reduction in 6 minutes)
  • Hardware: RTX 2080 Super (8GB VRAM)

Files

  • distillix-v0.safetensors - SafeTensors format (382 MB)
  • distillix-v0.3.gguf - GGUF format for llama.cpp (191 MB)
  • model_500steps.pt - PyTorch checkpoint

Usage

import torch
from safetensors.torch import load_file

# Load model weights
state_dict = load_file("distillix-v0.safetensors")

# For inference, use with llama.cpp or bitnet.cpp
# GGUF file is provided for CPU inference

Limitations

  • Early training (500 steps) - model needs more training
  • Limited training data (765 samples)
  • Best used as a starting point for further fine-tuning

License

Apache 2.0

Citation

@misc{distillix2025,
  title={Distillix: Frankenstein BitNet b1.58 Knowledge Distillation},
  author={Seaburg, Riley},
  year={2025},
  url={https://github.com/rileyseaburg/distillix}
}
Downloads last month
15
GGUF
Model size
0.1B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using rileyseaburg/distillix-100m-v0.3 1