File size: 4,081 Bytes
d8bc908
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
# True Ternary Refactor 10 — Sidecar Int8 Verification And Full-System Smoke

## Scope

The sidecar model files are now available locally:

- `arbitor/encoders/models/dinov2-small`
- `arbitor/encoders/models/moonshine-base`
- `arbitor/encoders/models/pig-vae/model.safetensors`

This pass verifies int8 quantization for the available sidecars and checks the full multimodal path through the Triton-backed ternary system.

## Sidecar Quantization Metadata

Added explicit metadata to quantized imported sidecars:

```text
_arb_quantize_requested
_arb_quantized
_arb_quantized_int8
```

The helper also freezes every sidecar parameter after quantization.

Applied to:

- `ImageSequencer.vit`
- `AudioSequencer.audio_encoder`
- `pig_vae` loader paths

## Verified Int8 Sidecars

`ImageSequencer(quantize_weights='int8')`:

```text
image quant_requested= int8
image quantized_int8= True
image quantized= True
image trainable= 0
image quant_classes= {'QConv2d': 1, 'QLinear': 72}
```

`AudioSequencer(quantize_weights='int8')`:

```text
audio quant_requested= int8
audio quantized_int8= True
audio quantized= True
audio trainable= 0
audio quant_classes= {'QLinear': 128}
```

Focused forward smokes:

```text
image_forward_ok (1, 254, 512) True
audio_forward_ok (1, 36, 512) True
```

## Full Multimodal CUDA Smoke

`ARBModel(enable_image=True, enable_audio=True, enable_vq=True, enable_graph=True, enable_memory_modules=True, enable_moe=True)`:

```text
image_quantized_int8 True
audio_quantized_int8 True
full_multimodal_cuda_train_smoke_ok logits=(1, 8, 297), targets=(1, 7), indices=(1, 298), loss=17.4929
```

This exercised:

- int8 DINO/ViT sidecar
- int8 Moonshine sidecar
- ternary image/audio projections
- multimodal VQ bridge
- graph path with Triton aggregation/gather kernels
- MoE path with Triton dense combine kernel
- output router / byte head
- backward and `_ternary_update_memory()`

Full model audit with image/audio enabled:

```text
logical ternary weights: 42,669,632
ternary training state: 55.72 MB
trainable float params: 0 tensors, 0.00 MB
frozen float params: 433 tensors, 318.80 MB
float buffers: 406 tensors, 0.00 MB
```

The frozen float params belong to imported sidecars. Their compute modules were verified as Quanto int8 wrappers where supported by the active environment.

## pig-vae Status

`pig-vae/model.safetensors` is present locally, and the loader now applies the same quantization metadata/freeze path as the vision/audio sidecars.

Runtime verification is blocked in this Python environment because `diffusers` is not installed:

```text
RuntimeError: pig-vae requires the optional diffusers dependency.
```

The system Python is externally managed, so I did not force-install packages into it. To verify pig-vae in this checkout, create/use a project venv and install:

```text
pip install -e .[diffusers]
```

Then run:

```text
python - <<'PY'
from arbitor.encoders.pig_vae import load_vae
vae = load_vae(device='cpu', quantize='int8')
print(vae.vae._arb_quantized_int8)
PY
```

## Kernel Coverage

Current Triton-backed full-system paths:

- packed ternary linear forward/backward/update
- packed ternary embedding forward/backward/update
- ternary RMSNorm
- `E` residual update
- `T_accum` update
- Graph edge weighting + target aggregation
- Graph VQ-index gather + residual add
- MoE dense route combine
- VideoHead denoise update

Remaining non-fused control loops:

- VideoHead diffusion/halting loop
- Graph hop loop
- MoE ACT iteration loop

Those loops control repeated computation and halting. They are supported by Triton kernels internally, but they are not yet persistent monolithic kernels.

## Verification

- `python -m py_compile arbitor/components.py arbitor/sequencers.py arbitor/encoders/audio.py arbitor/encoders/pig_vae.py arbitor/main.py arbitor/vq.py arbitor/kernel/ternary_scale.py arbitor/kernel/ternary_audit.py`
- `python -m pytest -q testing/test_tscale.py -k "cuda_triton_correctness_update_E or cuda_triton_tscale_path"`: `2 passed`
- full multimodal CUDA smoke passed with image/audio sidecars enabled