Training Status
Training is ongoing at W&B project eren23/crucible-lewm.
Current Status
| Variant | Epoch | safetensors | LQ40 export | Benchmarked | Convergence |
|---|---|---|---|---|---|
| baseline 192d/6e/6p | 100+ | available | β (full, Q4 pred) | β | β |
| slim_96d/4e/4p | 1 | β | β (full, q4, f32) | β | β |
| hybrid_ALAL_64d/4e/4p | 1 | pending | β (full) | partial | β |
| slim_48d/2e/2p | 1 | pending | pending | β | β |
| slim_64d/3e/3p | 1 | pending | pending | β | β |
| slim_96d/2e/3p | 1 | pending | pending | β | β |
| slim_128d/4e/4p | 1 | pending | pending | β | β |
| slim_192d/4e/4p | 1 | pending | pending | β | β |
| elastic_fixed100 | 1 | pending | pending | β | β |
Expected Quality Improvement
The baseline expert (100+ epochs) achieves cos=0.999 vs f32 at INT8+Q4. All slim variants are currently epoch 1.
Conservative estimate for epoch 100 slim variants:
- slim_96d/4e/4p: cos may improve from 0.9982 β 0.999+
- hybrid_ALAL: cos may improve from ~0.98 β ~0.995+
- WANDA variants: would benefit most from longer training + fine-tuning
Benchmarking Checklist
When new checkpoints complete training:
- Download from W&B
- Convert to safetensors + config.json
- Benchmark f32 on Apple Silicon (encode, predict, 20-step rollout)
- Export to LQ40 (full, q4-pred, f32)
- Benchmark INT8+Q4 on Apple Silicon
- Benchmark on ESP32-P4 (if hardware available)
- Run WASM browser benchmark
- Compare cos vs baseline expert
- Update model README with results
- Push to HuggingFace
How to Contribute Benchmarks
If you run these models on different hardware, please open an issue or PR with:
- Hardware platform
- Software version (Synapse commit)
- Benchmark results
- Any issues or observations