ARBS / docs /arbs-tts /README.md
CLIWorks's picture
Upload folder using huggingface_hub
d8bc908 verified

ARBS Ternary Training System (TTS)

E1TM Format β€” Exponent-1 Ternary Mantissa

E1TM encodes each weight group as one int8 exponent shared across N ternary mantissas.

W_eff[i] = S Γ— T[i]    where T[i] ∈ {-1, 0, +1},  S = 2^{E + Ξ”}

E  = int8 logβ‚‚ scale (persistent, per group)
Ξ”  = 4 Γ— corr_accum / (step Γ— gs)  (from BigInt accumulator)
S  = 2^{E+Ξ”} (float32, ephemeral β€” created per forward, discarded)

Format variants

Name TScaleType T per E gs E bpw T bpw Total bpw (inf) Precision
E1TM4 T4 4 4 2.000 1.58 3.58 Highest
E1TM6 T6 6 6 1.333 1.58 2.91
E1TM8 T8 8 8 1.000 1.58 2.58
E1TM16 T16 16 16 0.500 1.58 2.08
E1TM32 T32 32 32 0.250 1.58 1.85 Default
E1TM64 T64 64 64 0.125 1.58 1.71
E1TM96 T96 96 96 0.083 1.58 1.67 Most packed

Higher T number = more T per E = less storage = coarser per-weight magnitude.

Group sizes

The TScaleType name is the group size:

TScaleType.T4  β†’ gs = 4   β†’ E shared across 4  ternary mantissas
TScaleType.T32 β†’ gs = 32  β†’ E shared across 32 ternary mantissas
TScaleType.T96 β†’ gs = 96  β†’ E shared across 96 ternary mantissas

Persistent training state (all integer)

Buffer Type Size/weight Role
T_packed uint8 1.58 bpw Base-3 packed ternary {-1,0,+1}, 5 trits/byte
E int8 8/N bpw Logβ‚‚ scale, one per N-weight group
corr_accum int64 64/N bpw BigInt accumulator for gradient sign votes
step_counter int64 0 bpw Total steps processed

No float32/16 anywhere in persistent state. Float32 ephemeral W_eff is created per-forward and discarded after backward.

Why ternary over binary or int4

Format Values/weight Packing efficiency Null state
Binary 2 1 bit/bw (100%) No
Ternary 3 1.58 bpw (logβ‚‚3 β‰ˆ 95%) Yes (T=0 = null)
Int4 16 4 bpw (100%) No

Ternary's null state (T=0) provides structural sparsity β€” β‰ˆ38% of weights are zero, skipping matmul tiles. No other low-bit format has this property at equivalent bpw.

The BigInt difference

Unlike conventional quantization where E is static after conversion, ARBS TTS trains through E via a BigInt correlation accumulator:

corr_accum[g] -= Ξ£ (grad_sign Γ— T)   # int64, never clips or resets
Ξ” = 4 Γ— corr_accum / (step Γ— gs)      # continuous adjustment from integer division
S = 2^{E + Ξ”}                          # effective scale (ephemeral float32)

The division corr_accum / (step Γ— gs) is the Big Number Calculator operation β€” it converts the accumulated integer evidence into a continuous ratio with arbitrary precision. No threshold flips, no discrete steps, no information loss.

Training vs inference

Phase T_packed E corr_accum step S
Training Read-only Read-only Accumulates Increments Computed from corr/step
Inference (Option A) Frozen Frozen Frozen Frozen Burned into checkpoint
Inference (Option B) Frozen Fused Discarded Discarded Static 2^{E_fused}

Option A (export): keep corr_accum + step for continuous S. Option B (fuse): E_fused = round(E + 4 Γ— corr_accum / (step Γ— gs)) β€” discards corr_accum, drops to 2.6 bpw.

Relationship to IEEE float

IEEE FP32:  1 sign + 8 exponent + 23 mantissa  β†’ per value
E1TM32:    1 exponent (int8) + 32 ternary signs β†’ per group of 32

In IEEE, the exponent and mantissa belong to the same value. In E1TM, the exponent is shared β€” the mantissa is split into N independent ternary signs. The corr_accum provides sub-exponent precision beyond the int8 E, making the effective scale continuous rather than constrained to the 256 discrete 2^E values.