| # True Ternary Refactor 9 — Platform Components And Output Bridge |
|
|
| ## Scope |
|
|
| The codebase has moved into the `arbitor/` package. This pass focuses only on the newly added platform components: |
|
|
| - output bridge heads: `OutputRouter`, `VideoHead`, `TalkerHead` |
| - custom audio training encoder: `AudioVQEncoder` |
| - imported sidecars: `pig_vae`, Moonshine audio encoder, ViT/DINO vision encoder |
| - new loop-heavy output paths |
|
|
| ## AudioVQEncoder Ternarization |
|
|
| `arbitor/encoders/audio.py` was still a custom trainable float module: |
|
|
| ```text |
| nn.Conv1d -> nn.Conv1d -> nn.Linear -> nn.Embedding -> nn.Linear |
| ``` |
|
|
| Converted it to persistent ternary state: |
|
|
| - Added `TernaryConv1d`, implemented as `unfold + TernaryScaleTensor`. |
| - Replaced all conv blocks with `TernaryConv1d`. |
| - Replaced `proj` and `out_proj` with `TernaryScaleTensor`. |
| - Replaced the VQ codebook `nn.Embedding` with `TernaryEmbeddingTable`. |
|
|
| Focused audit: |
|
|
| ```text |
| AudioVQEncoder logical ternary weights: 404,864 |
| trainable float params: 0 |
| frozen float params: 0 |
| float buffers: 0 |
| ``` |
|
|
| Focused smoke: |
|
|
| ```text |
| audio_vq_encoder_ok logits=(1, 4, 289), indices=(1, 4) |
| ``` |
|
|
| ## Output Bridge Ternarization |
|
|
| `VideoHead.noise_embed` was a hidden float `nn.Embedding`. |
|
|
| Changed: |
|
|
| ```text |
| nn.Embedding(max_steps, TRIGRAM_DIM) |
| ``` |
|
|
| to: |
|
|
| ```text |
| TernaryEmbeddingTable(max_steps, TRIGRAM_DIM) |
| ``` |
|
|
| Focused audit for `VideoHead`: |
|
|
| ```text |
| logical ternary weights: 17,040,896 |
| trainable float params: 0 |
| frozen float params: 0 |
| float buffers: 0 |
| ``` |
|
|
| `TalkerHead.forward()` had a nested Python loop: |
|
|
| ```text |
| for token: |
| for stride: |
| logits = head(state) |
| append argmax token |
| ``` |
|
|
| Replaced it with one ternary head call over all conditioning tokens plus `repeat_interleave`, keeping the same stride/pad/truncate behavior. |
|
|
| Focused smoke: |
|
|
| ```text |
| video_head_ok latents=(1, 16, 1, 32, 32) |
| talker_head_ok tokens=(1, 10) |
| ``` |
|
|
| ## Imported Sidecars |
|
|
| `pig_vae` now explicitly freezes all parameters after optional int8 quantization: |
|
|
| ```text |
| quantize(vae, weights=qint8) |
| freeze(vae) |
| for p in vae.parameters(): p.requires_grad = False |
| ``` |
|
|
| Moonshine audio and ViT/DINO vision already default to `quantize_weights='int8'` through `optimum.quanto`, then freeze parameters. If `optimum.quanto` is unavailable, they fall back to frozen BF16; that fallback is not strict ternary, but it is frozen imported sidecar state rather than trainable model state. |
|
|
| ## New Kernel Support |
|
|
| Added a Triton denoise-step kernel for `VideoHead`: |
|
|
| ```text |
| latent = (latent - (1 - alpha) * pred_noise) / sqrt(alpha) |
| ``` |
|
|
| Forward and backward are Triton-backed on CUDA. The ACT-style diffusion loop remains because it controls halting and repeated shared-weight denoising, but the per-step latent update is now one custom kernel. |
|
|
| Correctness against PyTorch: |
|
|
| ```text |
| video_denoise_fwd_maxdiff: 7.15e-07 |
| video_denoise_grad_latent_maxdiff: 4.77e-07 |
| video_denoise_grad_pred_maxdiff: 1.79e-07 |
| ``` |
|
|
| ## Model-Level Verification |
|
|
| Package compile: |
|
|
| ```text |
| python -m py_compile arbitor/components.py arbitor/sequencers.py arbitor/encoders/audio.py arbitor/encoders/pig_vae.py arbitor/main.py arbitor/vq.py arbitor/kernel/ternary_scale.py arbitor/kernel/ternary_audit.py |
| ``` |
|
|
| ARBModel with image/audio imports disabled, VQ/Graph/Memory/MoE/output heads enabled: |
|
|
| ```text |
| logical ternary weights: 41,087,552 |
| ternary training state: 53.65 MB |
| trainable float params: 0 |
| frozen float params: 0 |
| float buffers: 0 |
| ``` |
|
|
| Smokes: |
|
|
| ```text |
| arb_model_cpu_forward_ok logits=(2, 8, 297), indices=(2, 8) |
| arb_model_cuda_train_smoke_ok logits=(2, 8, 297), targets=(2, 7), loss=12.1709 |
| ``` |
|
|
| The CUDA smoke completed forward, backward, and `_ternary_update_memory()`. |
|
|
| ## Remaining Work |
|
|
| 1. Add a strict sidecar audit mode that reports imported quantized sidecars separately from core ternary state. |
| 2. Add tests that instantiate Moonshine/ViT only when cached locally, to avoid network-dependent CI. |
| 3. Consider a true ternary transposed-conv replacement if `TinyNeuralCodec` is promoted from lazy frozen sidecar to trainable core model component. |
| 4. The VideoHead diffusion control loop is still Python-level. Full fusion would require a fixed-step, no-break kernel variant or a persistent CUDA kernel, which is a larger design change. |
|
|