Huihui-Qwen3.5-0.8B-abliterated-onnx

ONNX export of huihui-ai/Huihui-Qwen3.5-0.8B-abliterated for browser-side inference with WebGPU via transformers.js.

This is the base abliterated model without any LoRA or fine-tuning.

Build Process

  1. Base model: huihui-ai/Huihui-Qwen3.5-0.8B-abliterated
  2. ONNX: Weight transplant into reference graph structure from onnx-community/Qwen3.5-0.8B-ONNX
  3. Quantization: q8 (MatMul-only for decoder, full dynamic for embed/vision)

See ONNX_CONVERSION_GUIDE.md for detailed pipeline documentation.

Demo

Try it: bobber/routangseng-chat

Usage with transformers.js

import { Qwen3_5ForConditionalGeneration, AutoProcessor } from '@huggingface/transformers';

const model = await Qwen3_5ForConditionalGeneration.from_pretrained(
  'bobber/Huihui-Qwen3.5-0.8B-abliterated-onnx',
  { dtype: { embed_tokens: 'q8', vision_encoder: 'q8', decoder_model_merged: 'q8' }, device: 'webgpu' }
);

Files

File Size Description
onnx/decoder_model_merged_quantized.onnx + .onnx_data ~721 MB Decoder (q8, MatMul-only)
onnx/embed_tokens_quantized.onnx + .onnx_data ~243 MB Embeddings (q8)
onnx/vision_encoder_quantized.onnx + .onnx_data ~96 MB Vision encoder (q8, from reference)

Related

Downloads last month
11
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for bobber/Huihui-Qwen3.5-0.8B-abliterated-onnx

Quantized
(5)
this model