θ‚‰η³–η”Ÿ 0.8B Hot-Take β€” ONNX Q8

ONNX Q8 quantized model for WebGPU deployment via transformers.js.

Model Info

  • Base: huihui-ai/Huihui-Qwen3.5-0.8B-abliterated
  • Training: Phase 11 distillation from 4B Phase 10 Think-SFT (949 condensed examples)
  • Eval: Heuristic score 4.60/5
  • Format: ONNX Q8 (uint8 MatMul quantization)
  • Total size: ~1.1 GB

CDN / GitHub Pages Mirror

For faster loading and CORS support, chunked model files are hosted on GitHub Pages:

πŸ”— GitHub repo: bobbercheng/routangseng-models πŸ“¦ CDN URL: https://bobbercheng.github.io/routangseng-models/

Demo

Files

Component Size
decoder_model_merged_quantized.onnx + .onnx_data 756 MB
embed_tokens_quantized.onnx + .onnx_data 254 MB
vision_encoder_quantized.onnx + .onnx_data 101 MB

Production Note

The 0.8B model may emit dangling </think> tags at the start of output. Strip these at inference time.

Related

Downloads last month
5
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for bobber/routangseng-0.8b-hottake-onnx

Quantized
(5)
this model

Space using bobber/routangseng-0.8b-hottake-onnx 1