| # True Ternary Refactor 10 — Sidecar Int8 Verification And Full-System Smoke |
|
|
| ## Scope |
|
|
| The sidecar model files are now available locally: |
|
|
| - `arbitor/encoders/models/dinov2-small` |
| - `arbitor/encoders/models/moonshine-base` |
| - `arbitor/encoders/models/pig-vae/model.safetensors` |
|
|
| This pass verifies int8 quantization for the available sidecars and checks the full multimodal path through the Triton-backed ternary system. |
|
|
| ## Sidecar Quantization Metadata |
|
|
| Added explicit metadata to quantized imported sidecars: |
|
|
| ```text |
| _arb_quantize_requested |
| _arb_quantized |
| _arb_quantized_int8 |
| ``` |
|
|
| The helper also freezes every sidecar parameter after quantization. |
|
|
| Applied to: |
|
|
| - `ImageSequencer.vit` |
| - `AudioSequencer.audio_encoder` |
| - `pig_vae` loader paths |
|
|
| ## Verified Int8 Sidecars |
|
|
| `ImageSequencer(quantize_weights='int8')`: |
|
|
| ```text |
| image quant_requested= int8 |
| image quantized_int8= True |
| image quantized= True |
| image trainable= 0 |
| image quant_classes= {'QConv2d': 1, 'QLinear': 72} |
| ``` |
|
|
| `AudioSequencer(quantize_weights='int8')`: |
|
|
| ```text |
| audio quant_requested= int8 |
| audio quantized_int8= True |
| audio quantized= True |
| audio trainable= 0 |
| audio quant_classes= {'QLinear': 128} |
| ``` |
|
|
| Focused forward smokes: |
|
|
| ```text |
| image_forward_ok (1, 254, 512) True |
| audio_forward_ok (1, 36, 512) True |
| ``` |
|
|
| ## Full Multimodal CUDA Smoke |
|
|
| `ARBModel(enable_image=True, enable_audio=True, enable_vq=True, enable_graph=True, enable_memory_modules=True, enable_moe=True)`: |
|
|
| ```text |
| image_quantized_int8 True |
| audio_quantized_int8 True |
| full_multimodal_cuda_train_smoke_ok logits=(1, 8, 297), targets=(1, 7), indices=(1, 298), loss=17.4929 |
| ``` |
|
|
| This exercised: |
|
|
| - int8 DINO/ViT sidecar |
| - int8 Moonshine sidecar |
| - ternary image/audio projections |
| - multimodal VQ bridge |
| - graph path with Triton aggregation/gather kernels |
| - MoE path with Triton dense combine kernel |
| - output router / byte head |
| - backward and `_ternary_update_memory()` |
|
|
| Full model audit with image/audio enabled: |
|
|
| ```text |
| logical ternary weights: 42,669,632 |
| ternary training state: 55.72 MB |
| trainable float params: 0 tensors, 0.00 MB |
| frozen float params: 433 tensors, 318.80 MB |
| float buffers: 406 tensors, 0.00 MB |
| ``` |
|
|
| The frozen float params belong to imported sidecars. Their compute modules were verified as Quanto int8 wrappers where supported by the active environment. |
|
|
| ## pig-vae Status |
|
|
| `pig-vae/model.safetensors` is present locally, and the loader now applies the same quantization metadata/freeze path as the vision/audio sidecars. |
|
|
| Runtime verification is blocked in this Python environment because `diffusers` is not installed: |
|
|
| ```text |
| RuntimeError: pig-vae requires the optional diffusers dependency. |
| ``` |
|
|
| The system Python is externally managed, so I did not force-install packages into it. To verify pig-vae in this checkout, create/use a project venv and install: |
|
|
| ```text |
| pip install -e .[diffusers] |
| ``` |
|
|
| Then run: |
|
|
| ```text |
| python - <<'PY' |
| from arbitor.encoders.pig_vae import load_vae |
| vae = load_vae(device='cpu', quantize='int8') |
| print(vae.vae._arb_quantized_int8) |
| PY |
| ``` |
|
|
| ## Kernel Coverage |
|
|
| Current Triton-backed full-system paths: |
|
|
| - packed ternary linear forward/backward/update |
| - packed ternary embedding forward/backward/update |
| - ternary RMSNorm |
| - `E` residual update |
| - `T_accum` update |
| - Graph edge weighting + target aggregation |
| - Graph VQ-index gather + residual add |
| - MoE dense route combine |
| - VideoHead denoise update |
|
|
| Remaining non-fused control loops: |
|
|
| - VideoHead diffusion/halting loop |
| - Graph hop loop |
| - MoE ACT iteration loop |
|
|
| Those loops control repeated computation and halting. They are supported by Triton kernels internally, but they are not yet persistent monolithic kernels. |
|
|
| ## Verification |
|
|
| - `python -m py_compile arbitor/components.py arbitor/sequencers.py arbitor/encoders/audio.py arbitor/encoders/pig_vae.py arbitor/main.py arbitor/vq.py arbitor/kernel/ternary_scale.py arbitor/kernel/ternary_audit.py` |
| - `python -m pytest -q testing/test_tscale.py -k "cuda_triton_correctness_update_E or cuda_triton_tscale_path"`: `2 passed` |
| - full multimodal CUDA smoke passed with image/audio sidecars enabled |
|
|