肉糖生 0.8B Hot-Take — ONNX Q8

ONNX Q8 quantized model for WebGPU deployment via transformers.js.

Model Info

Base: huihui-ai/Huihui-Qwen3.5-0.8B-abliterated
Training: Phase 11 distillation from 4B Phase 10 Think-SFT (949 condensed examples)
Eval: Heuristic score 4.60/5
Format: ONNX Q8 (uint8 MatMul quantization)
Total size: ~1.1 GB

For faster loading and CORS support, chunked model files are hosted on GitHub Pages:

🔗 GitHub repo: bobbercheng/routangseng-models 📦 CDN URL: https://bobbercheng.github.io/routangseng-models/

Component	Size
`decoder_model_merged_quantized.onnx` + `.onnx_data`	756 MB
`embed_tokens_quantized.onnx` + `.onnx_data`	254 MB
`vision_encoder_quantized.onnx` + `.onnx_data`	101 MB