File size: 3,981 Bytes
d8bc908
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
# ARBS Ternary Training System (TTS)

## E1TM Format β€” Exponent-1 Ternary Mantissa

E1TM encodes each weight group as **one int8 exponent shared across N ternary mantissas**.

```
W_eff[i] = S Γ— T[i]    where T[i] ∈ {-1, 0, +1},  S = 2^{E + Ξ”}

E  = int8 logβ‚‚ scale (persistent, per group)
Ξ”  = 4 Γ— corr_accum / (step Γ— gs)  (from BigInt accumulator)
S  = 2^{E+Ξ”} (float32, ephemeral β€” created per forward, discarded)
```

### Format variants

| Name | TScaleType | T per E | gs | E bpw | T bpw | Total bpw (inf) | Precision |
|---|---|---|---|---|---|---|---|
| E1TM4 | T4 | 4 | 4 | 2.000 | 1.58 | 3.58 | Highest |
| E1TM6 | T6 | 6 | 6 | 1.333 | 1.58 | 2.91 | |
| E1TM8 | T8 | 8 | 8 | 1.000 | 1.58 | 2.58 | |
| E1TM16 | T16 | 16 | 16 | 0.500 | 1.58 | 2.08 | |
| **E1TM32** | **T32** | **32** | **32** | **0.250** | **1.58** | **1.85** | **Default** |
| E1TM64 | T64 | 64 | 64 | 0.125 | 1.58 | 1.71 | |
| E1TM96 | T96 | 96 | 96 | 0.083 | 1.58 | 1.67 | Most packed |

Higher T number = more T per E = less storage = coarser per-weight magnitude.

### Group sizes

The TScaleType name is the group size:

```python
TScaleType.T4  β†’ gs = 4   β†’ E shared across 4  ternary mantissas
TScaleType.T32 β†’ gs = 32  β†’ E shared across 32 ternary mantissas
TScaleType.T96 β†’ gs = 96  β†’ E shared across 96 ternary mantissas
```

### Persistent training state (all integer)

| Buffer | Type | Size/weight | Role |
|---|---|---|---|
| T_packed | uint8 | 1.58 bpw | Base-3 packed ternary {-1,0,+1}, 5 trits/byte |
| E | int8 | 8/N bpw | Logβ‚‚ scale, one per N-weight group |
| corr_accum | int64 | 64/N bpw | BigInt accumulator for gradient sign votes |
| step_counter | int64 | 0 bpw | Total steps processed |

**No float32/16 anywhere in persistent state.** Float32 ephemeral `W_eff` is created per-forward and discarded after backward.

### Why ternary over binary or int4

| Format | Values/weight | Packing efficiency | Null state |
|---|---|---|---|
| Binary | 2 | 1 bit/bw (100%) | No |
| Ternary | 3 | 1.58 bpw (logβ‚‚3 β‰ˆ 95%) | **Yes** (T=0 = null) |
| Int4 | 16 | 4 bpw (100%) | No |

Ternary's null state (T=0) provides structural sparsity β€” β‰ˆ38% of weights are zero, skipping matmul tiles. No other low-bit format has this property at equivalent bpw.

### The BigInt difference

Unlike conventional quantization where E is static after conversion, ARBS TTS trains **through** E via a BigInt correlation accumulator:

```
corr_accum[g] -= Ξ£ (grad_sign Γ— T)   # int64, never clips or resets
Ξ” = 4 Γ— corr_accum / (step Γ— gs)      # continuous adjustment from integer division
S = 2^{E + Ξ”}                          # effective scale (ephemeral float32)
```

The division `corr_accum / (step Γ— gs)` is the **Big Number Calculator** operation β€” it converts the accumulated integer evidence into a continuous ratio with arbitrary precision. No threshold flips, no discrete steps, no information loss.

### Training vs inference

| Phase | T_packed | E | corr_accum | step | S |
|---|---|---|---|---|---|
| Training | Read-only | Read-only | **Accumulates** | **Increments** | Computed from corr/step |
| Inference (Option A) | Frozen | Frozen | Frozen | Frozen | Burned into checkpoint |
| Inference (Option B) | Frozen | **Fused** | Discarded | Discarded | Static 2^{E_fused} |

**Option A** (export): keep corr_accum + step for continuous S.
**Option B** (fuse): `E_fused = round(E + 4 Γ— corr_accum / (step Γ— gs))` β€” discards corr_accum, drops to 2.6 bpw.

### Relationship to IEEE float

```
IEEE FP32:  1 sign + 8 exponent + 23 mantissa  β†’ per value
E1TM32:    1 exponent (int8) + 32 ternary signs β†’ per group of 32
```

In IEEE, the exponent and mantissa belong to the same value. In E1TM, the exponent is **shared** β€” the mantissa is split into N independent ternary signs. The corr_accum provides sub-exponent precision beyond the int8 E, making the effective scale continuous rather than constrained to the 256 discrete `2^E` values.