ARBS / docs /true-ternary /TRUE-TERNARY-REFACTOR10.md

Upload folder using huggingface_hub

d8bc908 verified 1 day ago

4.08 kB

	# True Ternary Refactor 10 — Sidecar Int8 Verification And Full-System Smoke

	## Scope

	The sidecar model files are now available locally:

	- `arbitor/encoders/models/dinov2-small`
	- `arbitor/encoders/models/moonshine-base`
	- `arbitor/encoders/models/pig-vae/model.safetensors`

	This pass verifies int8 quantization for the available sidecars and checks the full multimodal path through the Triton-backed ternary system.

	## Sidecar Quantization Metadata

	Added explicit metadata to quantized imported sidecars:

	```text
	_arb_quantize_requested
	_arb_quantized
	_arb_quantized_int8
	```

	The helper also freezes every sidecar parameter after quantization.

	Applied to:

	- `ImageSequencer.vit`
	- `AudioSequencer.audio_encoder`
	- `pig_vae` loader paths

	## Verified Int8 Sidecars

	`ImageSequencer(quantize_weights='int8')`:

	```text
	image quant_requested= int8
	image quantized_int8= True
	image quantized= True
	image trainable= 0
	image quant_classes= {'QConv2d': 1, 'QLinear': 72}
	```

	`AudioSequencer(quantize_weights='int8')`:

	```text
	audio quant_requested= int8
	audio quantized_int8= True
	audio quantized= True
	audio trainable= 0
	audio quant_classes= {'QLinear': 128}
	```

	Focused forward smokes:

	```text
	image_forward_ok (1, 254, 512) True
	audio_forward_ok (1, 36, 512) True
	```

	## Full Multimodal CUDA Smoke

	`ARBModel(enable_image=True, enable_audio=True, enable_vq=True, enable_graph=True, enable_memory_modules=True, enable_moe=True)`:

	```text
	image_quantized_int8 True
	audio_quantized_int8 True
	full_multimodal_cuda_train_smoke_ok logits=(1, 8, 297), targets=(1, 7), indices=(1, 298), loss=17.4929
	```

	This exercised:

	- int8 DINO/ViT sidecar
	- int8 Moonshine sidecar
	- ternary image/audio projections
	- multimodal VQ bridge
	- graph path with Triton aggregation/gather kernels
	- MoE path with Triton dense combine kernel
	- output router / byte head
	- backward and `_ternary_update_memory()`

	Full model audit with image/audio enabled:

	```text
	logical ternary weights: 42,669,632
	ternary training state: 55.72 MB
	trainable float params: 0 tensors, 0.00 MB
	frozen float params: 433 tensors, 318.80 MB
	float buffers: 406 tensors, 0.00 MB
	```

	The frozen float params belong to imported sidecars. Their compute modules were verified as Quanto int8 wrappers where supported by the active environment.

	## pig-vae Status

	`pig-vae/model.safetensors` is present locally, and the loader now applies the same quantization metadata/freeze path as the vision/audio sidecars.

	Runtime verification is blocked in this Python environment because `diffusers` is not installed:

	```text
	RuntimeError: pig-vae requires the optional diffusers dependency.
	```

	The system Python is externally managed, so I did not force-install packages into it. To verify pig-vae in this checkout, create/use a project venv and install:

	```text
	pip install -e .[diffusers]
	```

	Then run:

	```text
	python - <<'PY'
	from arbitor.encoders.pig_vae import load_vae
	vae = load_vae(device='cpu', quantize='int8')
	print(vae.vae._arb_quantized_int8)
	PY
	```

	## Kernel Coverage

	Current Triton-backed full-system paths:

	- packed ternary linear forward/backward/update
	- packed ternary embedding forward/backward/update
	- ternary RMSNorm
	- `E` residual update
	- `T_accum` update
	- Graph edge weighting + target aggregation
	- Graph VQ-index gather + residual add
	- MoE dense route combine
	- VideoHead denoise update

	Remaining non-fused control loops:

	- VideoHead diffusion/halting loop
	- Graph hop loop
	- MoE ACT iteration loop

	Those loops control repeated computation and halting. They are supported by Triton kernels internally, but they are not yet persistent monolithic kernels.

	## Verification

	- `python -m py_compile arbitor/components.py arbitor/sequencers.py arbitor/encoders/audio.py arbitor/encoders/pig_vae.py arbitor/main.py arbitor/vq.py arbitor/kernel/ternary_scale.py arbitor/kernel/ternary_audit.py`
	- `python -m pytest -q testing/test_tscale.py -k "cuda_triton_correctness_update_E or cuda_triton_tscale_path"`: `2 passed`
	- full multimodal CUDA smoke passed with image/audio sidecars enabled