θη³η 0.8B Hot-Take β ONNX Q8
ONNX Q8 quantized model for WebGPU deployment via transformers.js.
Model Info
- Base:
huihui-ai/Huihui-Qwen3.5-0.8B-abliterated - Training: Phase 11 distillation from 4B Phase 10 Think-SFT (949 condensed examples)
- Eval: Heuristic score 4.60/5
- Format: ONNX Q8 (uint8 MatMul quantization)
- Total size: ~1.1 GB
CDN / GitHub Pages Mirror
For faster loading and CORS support, chunked model files are hosted on GitHub Pages:
π GitHub repo: bobbercheng/routangseng-models
π¦ CDN URL: https://bobbercheng.github.io/routangseng-models/
Demo
- WebGPU Space: bobber/routangseng-chat
- GPU Space: bobber/routangseng-chat-gpu
Files
| Component | Size |
|---|---|
decoder_model_merged_quantized.onnx + .onnx_data |
756 MB |
embed_tokens_quantized.onnx + .onnx_data |
254 MB |
vision_encoder_quantized.onnx + .onnx_data |
101 MB |
Production Note
The 0.8B model may emit dangling </think> tags at the start of output. Strip these at inference time.
Related
- Torch model: bobber/routangseng-0.8b-hottake
- 4B recommended: bobber/routangseng-phase10-think-sft
- Project docs: bobber/routangseng-qwen35-4b-project
- Downloads last month
- 5
Model tree for bobber/routangseng-0.8b-hottake-onnx
Base model
Qwen/Qwen3.5-0.8B-Base Finetuned
Qwen/Qwen3.5-0.8B