Qwen3.5-0.8B-quotes-ONNX

A fine-tuned version of Qwen3.5-0.8B optimized for generating original, poetic motivational quotes. Converted to ONNX with Q4 quantization for efficient in-browser inference via Transformers.js.

Try it live: mentria.ai/tools/quote

How it works

The model runs entirely in your browser using WebGPU (or WASM fallback). No server, no API keys, no data leaves your device. Weights are cached locally after the first download.

Usage with Transformers.js

import { AutoProcessor, AutoModelForImageTextToText, TextStreamer } from '@huggingface/transformers';

const MODEL_ID = 'mentriaai/Qwen3.5-0.8B-quotes-ONNX';

const processor = await AutoProcessor.from_pretrained(MODEL_ID);
const model = await AutoModelForImageTextToText.from_pretrained(MODEL_ID, {
  dtype: { embed_tokens: 'q4', vision_encoder: 'fp16', decoder_model_merged: 'q4' },
  device: 'webgpu',
});

const messages = [
  { role: 'system', content: 'You are a poetic thinker. Write one short, original motivational quote (1-2 sentences). Do not attribute it to anyone. No preamble, no quotation marks — just the quote itself.' },
  { role: 'user', content: [{ type: 'text', text: 'Give me a motivational quote.' }] },
];

const textContent = processor.apply_chat_template(messages, {
  tokenize: false,
  add_generation_prompt: true,
});

const inputs = await processor(textContent, null, { padding: true, truncation: true });

const streamer = new TextStreamer(processor.tokenizer, {
  skip_prompt: true,
  skip_special_tokens: true,
  callback_function(token) {
    process.stdout.write(token);
  },
});

await model.generate({
  ...inputs,
  max_new_tokens: 128,
  do_sample: true,
  temperature: 0.9,
  top_p: 0.95,
  streamer,
});

Example outputs

The seeds you plant in silence become the forests that speak for you.

Fear is a compass — it always points toward the thing worth doing.

You are both the sculptor and the marble — chip away everything that is not you.

The coastline of your dreams is where the sea meets the shore — follow it.

Model details

Base model Qwen/Qwen3.5-0.8B
Architecture Hybrid Mamba (DeltaNet) + Transformer, 24 layers, 0.8B params
Fine-tuning LoRA (rank 16, scale 2.0, all linear layers), 200 iterations
Training data 229 curated motivational quotes in chat format
Training hardware Apple M4 Pro 24GB via mlx-lm
ONNX source Weight-swapped from onnx-community/Qwen3.5-0.8B-ONNX
Quantization Q4 (4-bit weight-only, block size 32)
License Apache 2.0

ONNX files

onnx/
  decoder_model_merged_q4.onnx       (813 KB graph + 452 MB weights)
  embed_tokens_q4.onnx               (857 B graph + 155 MB weights)
  vision_encoder_fp16.onnx           (183 KB graph + 195 MB weights)

Total download: ~802 MB on WebGPU (cached by the browser after first load).

Training details

  • Learning rate 2e-4 (10x the standard full fine-tuning rate)
  • All linear layers targeted (attention + MLP, not just attention)
  • Prompt masking enabled (loss computed only on quote output tokens)
  • 200 iterations selected from early stopping analysis to balance style learning vs. memorization

About mentria.ai

Mentria is a creative studio for tools, experiments, and visual transmissions. All tools run locally in your browser with zero server dependency. The motivational quote generator is one of several AI-powered tools available at mentria.ai/tools.

Downloads last month
48
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mentriaai/Qwen3.5-0.8B-quotes-ONNX

Finetuned
(159)
this model