111M-l12w720-owt-20260402
Parameter-matched baseline and subspace transformer checkpoints trained on OpenWebText.
Model Details
| Baseline | Subspace | |
|---|---|---|
| Parameters | 110,852,640 | 110,853,072 |
| Layers | 12 | 12 |
| Width | 720 | 720 |
| Heads | 12 | 12 |
| MLP Ratio | 4.0 | 5.2 |
| Q/K/V Fractions | - | 0.2/0.2/0.6 |
| Readout | - | v |
| Training Steps | 19,999 | 19,999 |
| Context Length | 1024 | 1024 |
| Vocab Size | 50,257 | 50,257 |
| Best Val Loss | 3.0298 | 3.0564 |
Files
baseline/best.pt- Baseline transformer checkpointsubspace/best.pt- Subspace transformer checkpointbaseline/config.json- Baseline resolved configsubspace/config.json- Subspace resolved config
Usage
from subspace_transformer.eval_harness import SubspaceTransformerLM
model = SubspaceTransformerLM(checkpoint_path="baseline/best.pt")
See subspace-transformer for full training and evaluation code.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support