Distillix 100M - BitNet Tool Calling Model

First 100M parameter BitNet b1.58 model trained for tool/code execution.

Weights are ternary {-1, 0, +1} for extreme compression. Runs on CPU with ~500MB RAM.

Model Details

Spec Value
Parameters 100M
Quantization 1.58-bit ternary
Training Steps 16,000
Training Loss 1.03
Inference CPU (no GPU needed)

Architecture

  • 12 transformer layers
  • 768 hidden dimension
  • 12 attention heads (GQA: 4 KV heads)
  • BitLinear layers in attention
  • RoPE positional encoding (theta=1M)
  • QK-Norm + Logit soft-capping

Capabilities

Trained on code execution and tool output data:

  • Command execution results
  • Tool calling format
  • Code outputs
  • Test results

Usage

from transformers import AutoTokenizer
from safetensors.torch import load_file

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("rileyseaburg/distillix-100m-v0.4")

# Load model (requires distillix library for BitNet layers)
# pip install distillix

Demo

Try it: Distillix Chat Space

Training

  • Optimizer: Muon + AdamW (FP32, no AMP)
  • Data: 11k code execution samples
  • Hardware: RTX 2080 Super / 8xL40S

Citation

@misc{distillix2025,
  title={Distillix: BitNet Tool Calling Model},
  author={Riley Seaburg},
  year={2025},
  url={https://github.com/rileyseaburg/distillix}
}
Downloads last month
12
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Spaces using rileyseaburg/distillix-100m-v0.4 2