File size: 3,205 Bytes
d8bc908 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 | # Preliminary TernaryScale Benchmark Results
**Date:** 2025-05-15
**Hardware:** RTX 4060 8GB
**Steps:** 2500 | **Warmup:** 250 | **Batch:** 64 | **CTX:** 66
**Params:** 1,668,128 | **Ternary forward:** T32
**Data:** enwik8 (train 1,003,854 bytes / val 111,540 bytes)
**All 6 configs run in parallel per step**
## Loss Over Time
| Step | SignSGD-C | SignSGD-E | Lion bf16 | Lion FP32 | Adam bf16 | Adam FP32 |
|------|-----------|-----------|-----------|-----------|-----------|-----------|
| 0 | 6.9770 | 8.0108 | 7.0011 | 6.5737 | 6.7317 | 6.8118 |
| 250 | 2.1486 | 1.9869 | 3.0933 | 3.1620 | 2.8921 | 2.7365 |
| 500 | 1.9801 | 1.7453 | 2.8211 | 2.5914 | 2.5739 | 2.4437 |
| 750 | 1.8682 | 1.6757 | 2.6109 | 2.5917 | 2.3861 | 2.2972 |
| 1000 | 1.8499 | 1.6601 | 2.6047 | 3.0808 | 2.3597 | 2.2618 |
| 1250 | 1.7943 | 1.6378 | 2.5462 | 3.2841 | 2.3527 | 2.3462 |
| 1500 | 1.7533 | 1.6506 | 2.5942 | 3.3279 | 2.4036 | 2.4838 |
| 1750 | 1.7601 | 1.6370 | 2.5652 | 3.3534 | 2.3876 | 2.5985 |
| 2000 | 1.7102 | 1.6108 | 2.5563 | 3.3116 | 2.3305 | 2.6721 |
| 2250 | 1.7354 | 1.5892 | 2.6064 | 3.4192 | 2.3509 | 2.7199 |
| 2500 | 1.6944 | 1.5861 | 2.5486 | 3.4403 | 2.3237 | 2.6991 |
## Summary
| Config | Final Loss | Min Loss | Loss@1k | Loss@2.5k | Step(ms) | Opt Mem | Total Mem |
|--------|-----------|----------|---------|------------|----------|---------|-----------|
| **SignSGD ConfigC** | 1.7113 | 1.5921 | 1.8099 | 1.6944 | 11.2 | 6.4 MB | 12.7 MB |
| **SignSGD ConfigE** | **1.6149** | **1.5157** | **1.7146** | **1.5861** | 11.2 | 6.4 MB | 12.7 MB |
| Lion bf16 | 2.5521 | 2.4668 | 2.5683 | 2.5486 | 11.2 | 9.6 MB | 12.7 MB |
| Lion FP32 | 3.4280 | 2.4911 | 3.0947 | 3.4403 | 11.2 | 12.7 MB | 19.1 MB |
| Adam bf16 | 2.3487 | 2.2480 | 2.3016 | 2.3237 | 11.2 | 9.6 MB | 12.7 MB |
| Adam FP32 | 2.7597 | 2.2049 | 2.3037 | 2.6991 | 11.2 | 19.1 MB | 25.4 MB |
## Loss Ratio vs SignSGD ConfigC
| Config | Ratio | Verdict |
|--------|-------|---------|
| SignSGD ConfigE | **0.944x** | better |
| Lion bf16 | 1.491x | worse |
| Lion FP32 | 2.003x | worse |
| Adam bf16 | 1.373x | worse |
| Adam FP32 | 1.613x | worse |
## Peak VRAM: 388 MB (all 6 models combined)
## Key Findings
1. **SignSGD ConfigE wins** — lowest loss (1.6149), lowest memory (12.7 MB total), no optimizer state
2. **SignSGD ConfigC second** — group-averaged S blurs magnitudes vs ConfigE's per-element S
3. **Momentum optimizers diverge on ternary** — Lion FP32 and Adam FP32 loss increases after step 1000
4. **bf16 helps momentum optimizers** — Lion bf16 (2.55) and Adam bf16 (2.35) are more stable than their FP32 variants (3.43, 2.76) but still far worse than SignSGD
5. **Root cause:** Momentum amplifies gradients that get blocked by ternary STE threshold mask, causing oscillation. SignSGD's pure sign update naturally complements ternary training — no amplification, no conflict.
6. **SignSGD memory advantage:** 6.4 MB optimizer mem vs 19.1 MB (Adam FP32) — 3x savings, zero state buffers
|