ARBS Ternary Training System (TTS)

E1TM Format — Exponent-1 Ternary Mantissa

E1TM encodes each weight group as one int8 exponent shared across N ternary mantissas.

W_eff[i] = S × T[i]    where T[i] ∈ {-1, 0, +1},  S = 2^{E + Δ}

E  = int8 log₂ scale (persistent, per group)
Δ  = 4 × corr_accum / (step × gs)  (from BigInt accumulator)
S  = 2^{E+Δ} (float32, ephemeral — created per forward, discarded)

Format variants

Name	TScaleType	T per E	gs	E bpw	T bpw	Total bpw (inf)	Precision
E1TM4	T4	4	4	2.000	1.58	3.58	Highest
E1TM6	T6	6	6	1.333	1.58	2.91
E1TM8	T8	8	8	1.000	1.58	2.58
E1TM16	T16	16	16	0.500	1.58	2.08
E1TM32	T32	32	32	0.250	1.58	1.85	Default
E1TM64	T64	64	64	0.125	1.58	1.71
E1TM96	T96	96	96	0.083	1.58	1.67	Most packed

Higher T number = more T per E = less storage = coarser per-weight magnitude.

Group sizes

The TScaleType name is the group size:

TScaleType.T4  → gs = 4   → E shared across 4  ternary mantissas
TScaleType.T32 → gs = 32  → E shared across 32 ternary mantissas
TScaleType.T96 → gs = 96  → E shared across 96 ternary mantissas

Persistent training state (all integer)

Buffer	Type	Size/weight	Role
T_packed	uint8	1.58 bpw	Base-3 packed ternary {-1,0,+1}, 5 trits/byte
E	int8	8/N bpw	Log₂ scale, one per N-weight group
corr_accum	int64	64/N bpw	BigInt accumulator for gradient sign votes
step_counter	int64	0 bpw	Total steps processed

No float32/16 anywhere in persistent state. Float32 ephemeral W_eff is created per-forward and discarded after backward.

Why ternary over binary or int4

Format	Values/weight	Packing efficiency	Null state
Binary	2	1 bit/bw (100%)	No
Ternary	3	1.58 bpw (log₂3 ≈ 95%)	Yes (T=0 = null)
Int4	16	4 bpw (100%)	No

Ternary's null state (T=0) provides structural sparsity — ≈38% of weights are zero, skipping matmul tiles. No other low-bit format has this property at equivalent bpw.

The BigInt difference

Unlike conventional quantization where E is static after conversion, ARBS TTS trains through E via a BigInt correlation accumulator:

corr_accum[g] -= Σ (grad_sign × T)   # int64, never clips or resets
Δ = 4 × corr_accum / (step × gs)      # continuous adjustment from integer division
S = 2^{E + Δ}                          # effective scale (ephemeral float32)

The division corr_accum / (step × gs) is the Big Number Calculator operation — it converts the accumulated integer evidence into a continuous ratio with arbitrary precision. No threshold flips, no discrete steps, no information loss.

Training vs inference

Phase	T_packed	E	corr_accum	step	S
Training	Read-only	Read-only	Accumulates	Increments	Computed from corr/step
Inference (Option A)	Frozen	Frozen	Frozen	Frozen	Burned into checkpoint
Inference (Option B)	Frozen	Fused	Discarded	Discarded	Static 2^{E_fused}

Option A (export): keep corr_accum + step for continuous S. Option B (fuse): E_fused = round(E + 4 × corr_accum / (step × gs)) — discards corr_accum, drops to 2.6 bpw.

Relationship to IEEE float

IEEE FP32:  1 sign + 8 exponent + 23 mantissa  → per value
E1TM32:    1 exponent (int8) + 32 ternary signs → per group of 32

In IEEE, the exponent and mantissa belong to the same value. In E1TM, the exponent is shared — the mantissa is split into N independent ternary signs. The corr_accum provides sub-exponent precision beyond the int8 E, making the effective scale continuous rather than constrained to the 256 discrete 2^E values.