Distillix 100M - BitNet Tool Calling Model
First 100M parameter BitNet b1.58 model trained for tool/code execution.
Weights are ternary {-1, 0, +1} for extreme compression. Runs on CPU with ~500MB RAM.
Model Details
| Spec | Value |
|---|---|
| Parameters | 100M |
| Quantization | 1.58-bit ternary |
| Training Steps | 16,000 |
| Training Loss | 1.03 |
| Inference | CPU (no GPU needed) |
Architecture
- 12 transformer layers
- 768 hidden dimension
- 12 attention heads (GQA: 4 KV heads)
- BitLinear layers in attention
- RoPE positional encoding (theta=1M)
- QK-Norm + Logit soft-capping
Capabilities
Trained on code execution and tool output data:
- Command execution results
- Tool calling format
- Code outputs
- Test results
Usage
from transformers import AutoTokenizer
from safetensors.torch import load_file
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("rileyseaburg/distillix-100m-v0.4")
# Load model (requires distillix library for BitNet layers)
# pip install distillix
Demo
Try it: Distillix Chat Space
Training
- Optimizer: Muon + AdamW (FP32, no AMP)
- Data: 11k code execution samples
- Hardware: RTX 2080 Super / 8xL40S
Citation
@misc{distillix2025,
title={Distillix: BitNet Tool Calling Model},
author={Riley Seaburg},
year={2025},
url={https://github.com/rileyseaburg/distillix}
}
- Downloads last month
- 12