YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Tritter 100M Hybrid BitNet

Model Details

  • Model Size: 100M parameters
  • Architecture: BitNet 1.58-bit ternary quantization
  • Training Methodology: Hybrid Predictive Training (Embedding-Prediction Paradigm)
  • Quantization: {-1, 0, 1} ternary weights

Overview

This model is trained using Hybrid Predictive Training, which combines:

  • Embedding-prediction paradigm: Core computation in continuous embedding space
  • BitNet 1.58-bit quantization: Efficient ternary weight representation
  • Dual prediction heads: Both embedding and token space outputs during training

The model operates in continuous embedding space at inference time, with token prediction as temporary scaffolding for training compatibility.

Architecture Specifications

  • Hidden Size: 1280
  • Number of Layers: 12
  • Number of Attention Heads: 16
  • Intermediate Size: ~3.5x hidden (4480)
  • Max Position Embeddings: 2048
  • Context Window: 2K tokens

Training Data

  • Total Tokens: ~50B tokens
  • Data Mix: Code-centric (Python, Rust, technical documentation)
  • Quality Gates: Hardcoded secrets rejected, security checks enabled

Comparison with Standard Training

For comparison with the standard trained baseline, see:

Key Differences:

Metric Standard Hybrid Predictive
Training Methodology Standard token prediction Embedding + token prediction
Convergence Speed Baseline Expected: 10-15% faster
Final Loss Baseline Expected: 5-10% lower
Embedding Quality Standard Expected: Improved semantic structure

Training Metrics

Convergence Comparison:

Step Standard Loss Hybrid Loss Improvement
10K metric pending metric pending โ€”
50K metric pending metric pending โ€”
100K metric pending metric pending โ€”

Final Metrics:

  • Final Training Loss: pending
  • Final Validation Loss: pending
  • Training Time: pending
  • Hardware: RTX 5080 16GB

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("tzervas/tritter-100m-hybrid-bitnet")
tokenizer = AutoTokenizer.from_pretrained("tzervas/tritter-100m-hybrid-bitnet")

# Generate text
inputs = tokenizer("def hello", return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0]))

Model Details

  • Framework: PyTorch
  • Quantization Method: BitNet 1.58-bit ternary
  • License: MIT

Research Background

For more information on hybrid predictive training, see:

  • Embedding-Prediction Paradigm: Operating in continuous embedding space
  • BitNet 1.58-bit: Efficient ternary quantization {-1, 0, 1}
  • Progressive Layer Loading: Support for larger models on limited VRAM

Citation

If you use this model, please cite:

@model{tritter100m_hybrid,
  author={Tzervas, K.},
  title={Tritter 100M Hybrid BitNet: Embedding-Prediction Training},
  year={2025},
  publisher={Hugging Face}
}

Created as part of the Tritter multimodal transformer research project.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support