ARBS / docs /true-ternary /TRUE-TERNARY-REFACTOR10.md
CLIWorks's picture
Upload folder using huggingface_hub
d8bc908 verified
# True Ternary Refactor 10 — Sidecar Int8 Verification And Full-System Smoke
## Scope
The sidecar model files are now available locally:
- `arbitor/encoders/models/dinov2-small`
- `arbitor/encoders/models/moonshine-base`
- `arbitor/encoders/models/pig-vae/model.safetensors`
This pass verifies int8 quantization for the available sidecars and checks the full multimodal path through the Triton-backed ternary system.
## Sidecar Quantization Metadata
Added explicit metadata to quantized imported sidecars:
```text
_arb_quantize_requested
_arb_quantized
_arb_quantized_int8
```
The helper also freezes every sidecar parameter after quantization.
Applied to:
- `ImageSequencer.vit`
- `AudioSequencer.audio_encoder`
- `pig_vae` loader paths
## Verified Int8 Sidecars
`ImageSequencer(quantize_weights='int8')`:
```text
image quant_requested= int8
image quantized_int8= True
image quantized= True
image trainable= 0
image quant_classes= {'QConv2d': 1, 'QLinear': 72}
```
`AudioSequencer(quantize_weights='int8')`:
```text
audio quant_requested= int8
audio quantized_int8= True
audio quantized= True
audio trainable= 0
audio quant_classes= {'QLinear': 128}
```
Focused forward smokes:
```text
image_forward_ok (1, 254, 512) True
audio_forward_ok (1, 36, 512) True
```
## Full Multimodal CUDA Smoke
`ARBModel(enable_image=True, enable_audio=True, enable_vq=True, enable_graph=True, enable_memory_modules=True, enable_moe=True)`:
```text
image_quantized_int8 True
audio_quantized_int8 True
full_multimodal_cuda_train_smoke_ok logits=(1, 8, 297), targets=(1, 7), indices=(1, 298), loss=17.4929
```
This exercised:
- int8 DINO/ViT sidecar
- int8 Moonshine sidecar
- ternary image/audio projections
- multimodal VQ bridge
- graph path with Triton aggregation/gather kernels
- MoE path with Triton dense combine kernel
- output router / byte head
- backward and `_ternary_update_memory()`
Full model audit with image/audio enabled:
```text
logical ternary weights: 42,669,632
ternary training state: 55.72 MB
trainable float params: 0 tensors, 0.00 MB
frozen float params: 433 tensors, 318.80 MB
float buffers: 406 tensors, 0.00 MB
```
The frozen float params belong to imported sidecars. Their compute modules were verified as Quanto int8 wrappers where supported by the active environment.
## pig-vae Status
`pig-vae/model.safetensors` is present locally, and the loader now applies the same quantization metadata/freeze path as the vision/audio sidecars.
Runtime verification is blocked in this Python environment because `diffusers` is not installed:
```text
RuntimeError: pig-vae requires the optional diffusers dependency.
```
The system Python is externally managed, so I did not force-install packages into it. To verify pig-vae in this checkout, create/use a project venv and install:
```text
pip install -e .[diffusers]
```
Then run:
```text
python - <<'PY'
from arbitor.encoders.pig_vae import load_vae
vae = load_vae(device='cpu', quantize='int8')
print(vae.vae._arb_quantized_int8)
PY
```
## Kernel Coverage
Current Triton-backed full-system paths:
- packed ternary linear forward/backward/update
- packed ternary embedding forward/backward/update
- ternary RMSNorm
- `E` residual update
- `T_accum` update
- Graph edge weighting + target aggregation
- Graph VQ-index gather + residual add
- MoE dense route combine
- VideoHead denoise update
Remaining non-fused control loops:
- VideoHead diffusion/halting loop
- Graph hop loop
- MoE ACT iteration loop
Those loops control repeated computation and halting. They are supported by Triton kernels internally, but they are not yet persistent monolithic kernels.
## Verification
- `python -m py_compile arbitor/components.py arbitor/sequencers.py arbitor/encoders/audio.py arbitor/encoders/pig_vae.py arbitor/main.py arbitor/vq.py arbitor/kernel/ternary_scale.py arbitor/kernel/ternary_audit.py`
- `python -m pytest -q testing/test_tscale.py -k "cuda_triton_correctness_update_E or cuda_triton_tscale_path"`: `2 passed`
- full multimodal CUDA smoke passed with image/audio sidecars enabled