Qwen3.5-0.8B Mermaid Diagram Generator - ONNX (FP16 Quantized)

ONNX version of fine-tuned Qwen3.5-0.8B Mermaid diagram generator with FP16 quantization for optimized browser inference.

Model Details

  • Base Model: Qwen/Qwen3.5-0.8B (0.8B parameters)
  • Format: ONNX with FP16 quantization
  • Purpose: Browser deployment via Transformers.js and WebGPU
  • Dataset: SpongeBOB9684/mermaid-text-to-diagram

Conversion

This ONNX model was converted from PyTorch fine-tuned model:

  • Framework: Optimum
  • Quantization: FP16 (16-bit floating point)
  • Compression: ~50% smaller than FP32
  • Compatibility: Transformers.js with ONNX Runtime WebGPU

Usage (Browser)

import { pipeline } from '@xenova/transformers';

// Create pipeline
const generator = await pipeline('text-generation', 'SpongeBOB9684/qwen3.5-0.8b-mermaid-generator-onnx', {
    dtype: 'q4',  // quantized model
    device: 'webgpu',
});

// Generate
const prompt = 'Create a flowchart for a simple login process';
const messages = [
  { role: 'system', content: 'You are a Mermaid diagram code generator. Output ONLY valid Mermaid code.' },
  { role: 'user', content: prompt },
];

const output = await generator(messages);
console.log(output);

Performance

  • Model Size: ~1.3 GB (FP16 quantized)
  • Load Time: < 5 seconds on typical browsers
  • Inference Speed: ~15-30 tokens/second (depends on hardware)
  • Memory: ~1.3 GB GPU memory (with quantization)

Advantages

  • FP16 Quantization:

    • ~50% smaller model size
    • Faster inference with minimal quality loss
    • Lower memory usage for browser deployment
  • ONNX Format:

    • Optimized for web inference
    • Cross-platform compatibility
    • Direct loading in Transformers.js

Limitations

  • FP16 quantization may cause slight precision differences compared to FP32
  • Best results with clear, specific prompts
  • Limited to Mermaid syntax (not general diagram description)

License

Apache 2.0

Acknowledgments

Downloads last month
189
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SpongeBOB9684/qwen3.5-0.8b-mermaid-generator-onnx

Quantized
(96)
this model

Dataset used to train SpongeBOB9684/qwen3.5-0.8b-mermaid-generator-onnx