# True Ternary Refactor 10 — Sidecar Int8 Verification And Full-System Smoke ## Scope The sidecar model files are now available locally: - `arbitor/encoders/models/dinov2-small` - `arbitor/encoders/models/moonshine-base` - `arbitor/encoders/models/pig-vae/model.safetensors` This pass verifies int8 quantization for the available sidecars and checks the full multimodal path through the Triton-backed ternary system. ## Sidecar Quantization Metadata Added explicit metadata to quantized imported sidecars: ```text _arb_quantize_requested _arb_quantized _arb_quantized_int8 ``` The helper also freezes every sidecar parameter after quantization. Applied to: - `ImageSequencer.vit` - `AudioSequencer.audio_encoder` - `pig_vae` loader paths ## Verified Int8 Sidecars `ImageSequencer(quantize_weights='int8')`: ```text image quant_requested= int8 image quantized_int8= True image quantized= True image trainable= 0 image quant_classes= {'QConv2d': 1, 'QLinear': 72} ``` `AudioSequencer(quantize_weights='int8')`: ```text audio quant_requested= int8 audio quantized_int8= True audio quantized= True audio trainable= 0 audio quant_classes= {'QLinear': 128} ``` Focused forward smokes: ```text image_forward_ok (1, 254, 512) True audio_forward_ok (1, 36, 512) True ``` ## Full Multimodal CUDA Smoke `ARBModel(enable_image=True, enable_audio=True, enable_vq=True, enable_graph=True, enable_memory_modules=True, enable_moe=True)`: ```text image_quantized_int8 True audio_quantized_int8 True full_multimodal_cuda_train_smoke_ok logits=(1, 8, 297), targets=(1, 7), indices=(1, 298), loss=17.4929 ``` This exercised: - int8 DINO/ViT sidecar - int8 Moonshine sidecar - ternary image/audio projections - multimodal VQ bridge - graph path with Triton aggregation/gather kernels - MoE path with Triton dense combine kernel - output router / byte head - backward and `_ternary_update_memory()` Full model audit with image/audio enabled: ```text logical ternary weights: 42,669,632 ternary training state: 55.72 MB trainable float params: 0 tensors, 0.00 MB frozen float params: 433 tensors, 318.80 MB float buffers: 406 tensors, 0.00 MB ``` The frozen float params belong to imported sidecars. Their compute modules were verified as Quanto int8 wrappers where supported by the active environment. ## pig-vae Status `pig-vae/model.safetensors` is present locally, and the loader now applies the same quantization metadata/freeze path as the vision/audio sidecars. Runtime verification is blocked in this Python environment because `diffusers` is not installed: ```text RuntimeError: pig-vae requires the optional diffusers dependency. ``` The system Python is externally managed, so I did not force-install packages into it. To verify pig-vae in this checkout, create/use a project venv and install: ```text pip install -e .[diffusers] ``` Then run: ```text python - <<'PY' from arbitor.encoders.pig_vae import load_vae vae = load_vae(device='cpu', quantize='int8') print(vae.vae._arb_quantized_int8) PY ``` ## Kernel Coverage Current Triton-backed full-system paths: - packed ternary linear forward/backward/update - packed ternary embedding forward/backward/update - ternary RMSNorm - `E` residual update - `T_accum` update - Graph edge weighting + target aggregation - Graph VQ-index gather + residual add - MoE dense route combine - VideoHead denoise update Remaining non-fused control loops: - VideoHead diffusion/halting loop - Graph hop loop - MoE ACT iteration loop Those loops control repeated computation and halting. They are supported by Triton kernels internally, but they are not yet persistent monolithic kernels. ## Verification - `python -m py_compile arbitor/components.py arbitor/sequencers.py arbitor/encoders/audio.py arbitor/encoders/pig_vae.py arbitor/main.py arbitor/vq.py arbitor/kernel/ternary_scale.py arbitor/kernel/ternary_audit.py` - `python -m pytest -q testing/test_tscale.py -k "cuda_triton_correctness_update_E or cuda_triton_tscale_path"`: `2 passed` - full multimodal CUDA smoke passed with image/audio sidecars enabled