Upload folder using huggingface_hub

d8bc908 verified 1 day ago

3.98 kB

	# ARBS Ternary Training System (TTS)

	## E1TM Format — Exponent-1 Ternary Mantissa

	E1TM encodes each weight group as one int8 exponent shared across N ternary mantissas.

	```
	W_eff[i] = S × T[i] where T[i] ∈ {-1, 0, +1}, S = 2^{E + Δ}

	E = int8 log₂ scale (persistent, per group)
	Δ = 4 × corr_accum / (step × gs) (from BigInt accumulator)
	S = 2^{E+Δ} (float32, ephemeral — created per forward, discarded)
	```

	### Format variants

	\| Name \| TScaleType \| T per E \| gs \| E bpw \| T bpw \| Total bpw (inf) \| Precision \|
	\|---\|---\|---\|---\|---\|---\|---\|---\|
	\| E1TM4 \| T4 \| 4 \| 4 \| 2.000 \| 1.58 \| 3.58 \| Highest \|
	\| E1TM6 \| T6 \| 6 \| 6 \| 1.333 \| 1.58 \| 2.91 \| \|
	\| E1TM8 \| T8 \| 8 \| 8 \| 1.000 \| 1.58 \| 2.58 \| \|
	\| E1TM16 \| T16 \| 16 \| 16 \| 0.500 \| 1.58 \| 2.08 \| \|
	\| E1TM32 \| T32 \| 32 \| 32 \| 0.250 \| 1.58 \| 1.85 \| Default \|
	\| E1TM64 \| T64 \| 64 \| 64 \| 0.125 \| 1.58 \| 1.71 \| \|
	\| E1TM96 \| T96 \| 96 \| 96 \| 0.083 \| 1.58 \| 1.67 \| Most packed \|

	Higher T number = more T per E = less storage = coarser per-weight magnitude.

	### Group sizes

	The TScaleType name is the group size:

	```python
	TScaleType.T4 → gs = 4 → E shared across 4 ternary mantissas
	TScaleType.T32 → gs = 32 → E shared across 32 ternary mantissas
	TScaleType.T96 → gs = 96 → E shared across 96 ternary mantissas
	```

	### Persistent training state (all integer)

	\| Buffer \| Type \| Size/weight \| Role \|
	\|---\|---\|---\|---\|
	\| T_packed \| uint8 \| 1.58 bpw \| Base-3 packed ternary {-1,0,+1}, 5 trits/byte \|
	\| E \| int8 \| 8/N bpw \| Log₂ scale, one per N-weight group \|
	\| corr_accum \| int64 \| 64/N bpw \| BigInt accumulator for gradient sign votes \|
	\| step_counter \| int64 \| 0 bpw \| Total steps processed \|

	No float32/16 anywhere in persistent state. Float32 ephemeral `W_eff` is created per-forward and discarded after backward.

	### Why ternary over binary or int4

	\| Format \| Values/weight \| Packing efficiency \| Null state \|
	\|---\|---\|---\|---\|
	\| Binary \| 2 \| 1 bit/bw (100%) \| No \|
	\| Ternary \| 3 \| 1.58 bpw (log₂3 ≈ 95%) \| Yes (T=0 = null) \|
	\| Int4 \| 16 \| 4 bpw (100%) \| No \|

	Ternary's null state (T=0) provides structural sparsity — ≈38% of weights are zero, skipping matmul tiles. No other low-bit format has this property at equivalent bpw.

	### The BigInt difference

	Unlike conventional quantization where E is static after conversion, ARBS TTS trains through E via a BigInt correlation accumulator:

	```
	corr_accum[g] -= Σ (grad_sign × T) # int64, never clips or resets
	Δ = 4 × corr_accum / (step × gs) # continuous adjustment from integer division
	S = 2^{E + Δ} # effective scale (ephemeral float32)
	```

	The division `corr_accum / (step × gs)` is the Big Number Calculator operation — it converts the accumulated integer evidence into a continuous ratio with arbitrary precision. No threshold flips, no discrete steps, no information loss.

	### Training vs inference

	\| Phase \| T_packed \| E \| corr_accum \| step \| S \|
	\|---\|---\|---\|---\|---\|---\|
	\| Training \| Read-only \| Read-only \| Accumulates \| Increments \| Computed from corr/step \|
	\| Inference (Option A) \| Frozen \| Frozen \| Frozen \| Frozen \| Burned into checkpoint \|
	\| Inference (Option B) \| Frozen \| Fused \| Discarded \| Discarded \| Static 2^{E_fused} \|

	Option A (export): keep corr_accum + step for continuous S.
	Option B (fuse): `E_fused = round(E + 4 × corr_accum / (step × gs))` — discards corr_accum, drops to 2.6 bpw.

	### Relationship to IEEE float

	```
	IEEE FP32: 1 sign + 8 exponent + 23 mantissa → per value
	E1TM32: 1 exponent (int8) + 32 ternary signs → per group of 32
	```

	In IEEE, the exponent and mantissa belong to the same value. In E1TM, the exponent is shared — the mantissa is split into N independent ternary signs. The corr_accum provides sub-exponent precision beyond the int8 E, making the effective scale continuous rather than constrained to the 256 discrete `2^E` values.