Qwen3.5-0.8B Mermaid Diagram Generator - ONNX (FP16 Quantized)

ONNX version of fine-tuned Qwen3.5-0.8B Mermaid diagram generator with FP16 quantization for optimized browser inference.

Model Details

Base Model: Qwen/Qwen3.5-0.8B (0.8B parameters)
Format: ONNX with FP16 quantization
Purpose: Browser deployment via Transformers.js and WebGPU
Dataset: SpongeBOB9684/mermaid-text-to-diagram

Conversion

This ONNX model was converted from PyTorch fine-tuned model:

Framework: Optimum
Quantization: FP16 (16-bit floating point)
Compression: ~50% smaller than FP32
Compatibility: Transformers.js with ONNX Runtime WebGPU

Usage (Browser)

import { pipeline } from '@xenova/transformers';

// Create pipeline
const generator = await pipeline('text-generation', 'SpongeBOB9684/qwen3.5-0.8b-mermaid-generator-onnx', {
    dtype: 'q4',  // quantized model
    device: 'webgpu',
});

// Generate
const prompt = 'Create a flowchart for a simple login process';
const messages = [
  { role: 'system', content: 'You are a Mermaid diagram code generator. Output ONLY valid Mermaid code.' },
  { role: 'user', content: prompt },
];

const output = await generator(messages);
console.log(output);

Performance

Model Size: ~1.3 GB (FP16 quantized)
Load Time: < 5 seconds on typical browsers
Inference Speed: ~15-30 tokens/second (depends on hardware)
Memory: ~1.3 GB GPU memory (with quantization)

Advantages

FP16 Quantization:
- ~50% smaller model size
- Faster inference with minimal quality loss
- Lower memory usage for browser deployment
ONNX Format:
- Optimized for web inference
- Cross-platform compatibility
- Direct loading in Transformers.js

Limitations

FP16 quantization may cause slight precision differences compared to FP32
Best results with clear, specific prompts
Limited to Mermaid syntax (not general diagram description)

License

Apache 2.0

Acknowledgments

Base model: Qwen3.5-0.8B by Alibaba Cloud
ONNX conversion: Optimum library
Dataset: SpongeBOB9684/mermaid-text-to-diagram
Training framework: Unsloth

Downloads last month: 189

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SpongeBOB9684/qwen3.5-0.8b-mermaid-generator-onnx

Base model

Qwen/Qwen3.5-0.8B-Base

Finetuned

Qwen/Qwen3.5-0.8B

Quantized

(96)

this model

SpongeBOB9684
/

qwen3.5-0.8b-mermaid-generator-onnx