# True Ternary Refactor 10 — Sidecar Int8 Verification And Full-System Smoke

## Scope

The sidecar model files are now available locally:

- `arbitor/encoders/models/dinov2-small`
- `arbitor/encoders/models/moonshine-base`
- `arbitor/encoders/models/pig-vae/model.safetensors`

This pass verifies int8 quantization for the available sidecars and checks the full multimodal path through the Triton-backed ternary system.

## Sidecar Quantization Metadata

Added explicit metadata to quantized imported sidecars:

```text
_arb_quantize_requested
_arb_quantized
_arb_quantized_int8
```

The helper also freezes every sidecar parameter after quantization.

Applied to:

- `ImageSequencer.vit`
- `AudioSequencer.audio_encoder`
- `pig_vae` loader paths

## Verified Int8 Sidecars

`ImageSequencer(quantize_weights='int8')`:

```text
image quant_requested= int8
image quantized_int8= True
image quantized= True
image trainable= 0
image quant_classes= {'QConv2d': 1, 'QLinear': 72}
```

`AudioSequencer(quantize_weights='int8')`:

```text
audio quant_requested= int8
audio quantized_int8= True
audio quantized= True
audio trainable= 0
audio quant_classes= {'QLinear': 128}
```

Focused forward smokes:

```text
image_forward_ok (1, 254, 512) True
audio_forward_ok (1, 36, 512) True
```

## Full Multimodal CUDA Smoke

`ARBModel(enable_image=True, enable_audio=True, enable_vq=True, enable_graph=True, enable_memory_modules=True, enable_moe=True)`:

```text
image_quantized_int8 True
audio_quantized_int8 True
full_multimodal_cuda_train_smoke_ok logits=(1, 8, 297), targets=(1, 7), indices=(1, 298), loss=17.4929
```

This exercised:

- int8 DINO/ViT sidecar
- int8 Moonshine sidecar
- ternary image/audio projections
- multimodal VQ bridge
- graph path with Triton aggregation/gather kernels
- MoE path with Triton dense combine kernel
- output router / byte head
- backward and `_ternary_update_memory()`

Full model audit with image/audio enabled:

```text
logical ternary weights: 42,669,632
ternary training state: 55.72 MB
trainable float params: 0 tensors, 0.00 MB
frozen float params: 433 tensors, 318.80 MB
float buffers: 406 tensors, 0.00 MB
```

The frozen float params belong to imported sidecars. Their compute modules were verified as Quanto int8 wrappers where supported by the active environment.

## pig-vae Status

`pig-vae/model.safetensors` is present locally, and the loader now applies the same quantization metadata/freeze path as the vision/audio sidecars.

Runtime verification is blocked in this Python environment because `diffusers` is not installed:

```text
RuntimeError: pig-vae requires the optional diffusers dependency.
```

The system Python is externally managed, so I did not force-install packages into it. To verify pig-vae in this checkout, create/use a project venv and install:

```text
pip install -e .[diffusers]
```

Then run:

```text
python - <<'PY'
from arbitor.encoders.pig_vae import load_vae
vae = load_vae(device='cpu', quantize='int8')
print(vae.vae._arb_quantized_int8)
PY
```

## Kernel Coverage

Current Triton-backed full-system paths:

- packed ternary linear forward/backward/update
- packed ternary embedding forward/backward/update
- ternary RMSNorm
- `E` residual update
- `T_accum` update
- Graph edge weighting + target aggregation
- Graph VQ-index gather + residual add
- MoE dense route combine
- VideoHead denoise update

Remaining non-fused control loops:

- VideoHead diffusion/halting loop
- Graph hop loop
- MoE ACT iteration loop

Those loops control repeated computation and halting. They are supported by Triton kernels internally, but they are not yet persistent monolithic kernels.

## Verification

- `python -m py_compile arbitor/components.py arbitor/sequencers.py arbitor/encoders/audio.py arbitor/encoders/pig_vae.py arbitor/main.py arbitor/vq.py arbitor/kernel/ternary_scale.py arbitor/kernel/ternary_audit.py`
- `python -m pytest -q testing/test_tscale.py -k "cuda_triton_correctness_update_E or cuda_triton_tscale_path"`: `2 passed`
- full multimodal CUDA smoke passed with image/audio sidecars enabled