eren23
/

lewm-models

Model card Files Files and versions

lewm-models / docs /training-status.md

eren23

Initial: LeWM model collection with full quantization documentation

6cdcc30 20 days ago

|

history blame contribute delete

1.93 kB

Training Status

Training is ongoing at W&B project eren23/crucible-lewm.

Current Status

Variant	Epoch	safetensors	LQ40 export	Benchmarked	Convergence
baseline 192d/6e/6p	100+	available	✓ (full, Q4 pred)	✓	✓
slim_96d/4e/4p	1	✓	✓ (full, q4, f32)	✓	✗
hybrid_ALAL_64d/4e/4p	1	pending	✓ (full)	partial	✗
slim_48d/2e/2p	1	pending	pending	✗	✗
slim_64d/3e/3p	1	pending	pending	✗	✗
slim_96d/2e/3p	1	pending	pending	✗	✗
slim_128d/4e/4p	1	pending	pending	✗	✗
slim_192d/4e/4p	1	pending	pending	✗	✗
elastic_fixed100	1	pending	pending	✗	✗

Expected Quality Improvement

The baseline expert (100+ epochs) achieves cos=0.999 vs f32 at INT8+Q4. All slim variants are currently epoch 1.

Conservative estimate for epoch 100 slim variants:

slim_96d/4e/4p: cos may improve from 0.9982 → 0.999+
hybrid_ALAL: cos may improve from ~0.98 → ~0.995+
WANDA variants: would benefit most from longer training + fine-tuning

Benchmarking Checklist

When new checkpoints complete training:

Download from W&B
Convert to safetensors + config.json
Benchmark f32 on Apple Silicon (encode, predict, 20-step rollout)
Export to LQ40 (full, q4-pred, f32)
Benchmark INT8+Q4 on Apple Silicon
Benchmark on ESP32-P4 (if hardware available)
Run WASM browser benchmark
Compare cos vs baseline expert
Update model README with results
Push to HuggingFace

How to Contribute Benchmarks

If you run these models on different hardware, please open an issue or PR with:

Hardware platform
Software version (Synapse commit)
Benchmark results
Any issues or observations