File size: 4,081 Bytes
d8bc908 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 | # True Ternary Refactor 10 — Sidecar Int8 Verification And Full-System Smoke
## Scope
The sidecar model files are now available locally:
- `arbitor/encoders/models/dinov2-small`
- `arbitor/encoders/models/moonshine-base`
- `arbitor/encoders/models/pig-vae/model.safetensors`
This pass verifies int8 quantization for the available sidecars and checks the full multimodal path through the Triton-backed ternary system.
## Sidecar Quantization Metadata
Added explicit metadata to quantized imported sidecars:
```text
_arb_quantize_requested
_arb_quantized
_arb_quantized_int8
```
The helper also freezes every sidecar parameter after quantization.
Applied to:
- `ImageSequencer.vit`
- `AudioSequencer.audio_encoder`
- `pig_vae` loader paths
## Verified Int8 Sidecars
`ImageSequencer(quantize_weights='int8')`:
```text
image quant_requested= int8
image quantized_int8= True
image quantized= True
image trainable= 0
image quant_classes= {'QConv2d': 1, 'QLinear': 72}
```
`AudioSequencer(quantize_weights='int8')`:
```text
audio quant_requested= int8
audio quantized_int8= True
audio quantized= True
audio trainable= 0
audio quant_classes= {'QLinear': 128}
```
Focused forward smokes:
```text
image_forward_ok (1, 254, 512) True
audio_forward_ok (1, 36, 512) True
```
## Full Multimodal CUDA Smoke
`ARBModel(enable_image=True, enable_audio=True, enable_vq=True, enable_graph=True, enable_memory_modules=True, enable_moe=True)`:
```text
image_quantized_int8 True
audio_quantized_int8 True
full_multimodal_cuda_train_smoke_ok logits=(1, 8, 297), targets=(1, 7), indices=(1, 298), loss=17.4929
```
This exercised:
- int8 DINO/ViT sidecar
- int8 Moonshine sidecar
- ternary image/audio projections
- multimodal VQ bridge
- graph path with Triton aggregation/gather kernels
- MoE path with Triton dense combine kernel
- output router / byte head
- backward and `_ternary_update_memory()`
Full model audit with image/audio enabled:
```text
logical ternary weights: 42,669,632
ternary training state: 55.72 MB
trainable float params: 0 tensors, 0.00 MB
frozen float params: 433 tensors, 318.80 MB
float buffers: 406 tensors, 0.00 MB
```
The frozen float params belong to imported sidecars. Their compute modules were verified as Quanto int8 wrappers where supported by the active environment.
## pig-vae Status
`pig-vae/model.safetensors` is present locally, and the loader now applies the same quantization metadata/freeze path as the vision/audio sidecars.
Runtime verification is blocked in this Python environment because `diffusers` is not installed:
```text
RuntimeError: pig-vae requires the optional diffusers dependency.
```
The system Python is externally managed, so I did not force-install packages into it. To verify pig-vae in this checkout, create/use a project venv and install:
```text
pip install -e .[diffusers]
```
Then run:
```text
python - <<'PY'
from arbitor.encoders.pig_vae import load_vae
vae = load_vae(device='cpu', quantize='int8')
print(vae.vae._arb_quantized_int8)
PY
```
## Kernel Coverage
Current Triton-backed full-system paths:
- packed ternary linear forward/backward/update
- packed ternary embedding forward/backward/update
- ternary RMSNorm
- `E` residual update
- `T_accum` update
- Graph edge weighting + target aggregation
- Graph VQ-index gather + residual add
- MoE dense route combine
- VideoHead denoise update
Remaining non-fused control loops:
- VideoHead diffusion/halting loop
- Graph hop loop
- MoE ACT iteration loop
Those loops control repeated computation and halting. They are supported by Triton kernels internally, but they are not yet persistent monolithic kernels.
## Verification
- `python -m py_compile arbitor/components.py arbitor/sequencers.py arbitor/encoders/audio.py arbitor/encoders/pig_vae.py arbitor/main.py arbitor/vq.py arbitor/kernel/ternary_scale.py arbitor/kernel/ternary_audit.py`
- `python -m pytest -q testing/test_tscale.py -k "cuda_triton_correctness_update_E or cuda_triton_tscale_path"`: `2 passed`
- full multimodal CUDA smoke passed with image/audio sidecars enabled
|