YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Tritter 100M Hybrid BitNet
Model Details
- Model Size: 100M parameters
- Architecture: BitNet 1.58-bit ternary quantization
- Training Methodology: Hybrid Predictive Training (Embedding-Prediction Paradigm)
- Quantization: {-1, 0, 1} ternary weights
Overview
This model is trained using Hybrid Predictive Training, which combines:
- Embedding-prediction paradigm: Core computation in continuous embedding space
- BitNet 1.58-bit quantization: Efficient ternary weight representation
- Dual prediction heads: Both embedding and token space outputs during training
The model operates in continuous embedding space at inference time, with token prediction as temporary scaffolding for training compatibility.
Architecture Specifications
- Hidden Size: 1280
- Number of Layers: 12
- Number of Attention Heads: 16
- Intermediate Size: ~3.5x hidden (4480)
- Max Position Embeddings: 2048
- Context Window: 2K tokens
Training Data
- Total Tokens: ~50B tokens
- Data Mix: Code-centric (Python, Rust, technical documentation)
- Quality Gates: Hardcoded secrets rejected, security checks enabled
Comparison with Standard Training
For comparison with the standard trained baseline, see:
- Standard 100M: tzervas/tritter-100m-bitnet
Key Differences:
| Metric | Standard | Hybrid Predictive |
|---|---|---|
| Training Methodology | Standard token prediction | Embedding + token prediction |
| Convergence Speed | Baseline | Expected: 10-15% faster |
| Final Loss | Baseline | Expected: 5-10% lower |
| Embedding Quality | Standard | Expected: Improved semantic structure |
Training Metrics
Convergence Comparison:
| Step | Standard Loss | Hybrid Loss | Improvement |
|---|---|---|---|
| 10K | metric pending | metric pending | โ |
| 50K | metric pending | metric pending | โ |
| 100K | metric pending | metric pending | โ |
Final Metrics:
- Final Training Loss: pending
- Final Validation Loss: pending
- Training Time: pending
- Hardware: RTX 5080 16GB
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("tzervas/tritter-100m-hybrid-bitnet")
tokenizer = AutoTokenizer.from_pretrained("tzervas/tritter-100m-hybrid-bitnet")
# Generate text
inputs = tokenizer("def hello", return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0]))
Model Details
- Framework: PyTorch
- Quantization Method: BitNet 1.58-bit ternary
- License: MIT
Research Background
For more information on hybrid predictive training, see:
- Embedding-Prediction Paradigm: Operating in continuous embedding space
- BitNet 1.58-bit: Efficient ternary quantization {-1, 0, 1}
- Progressive Layer Loading: Support for larger models on limited VRAM
Citation
If you use this model, please cite:
@model{tritter100m_hybrid,
author={Tzervas, K.},
title={Tritter 100M Hybrid BitNet: Embedding-Prediction Training},
year={2025},
publisher={Hugging Face}
}
Created as part of the Tritter multimodal transformer research project.
- Downloads last month
- -
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support