Qwen3.5-0.8B-quotes-ONNX
A fine-tuned version of Qwen3.5-0.8B optimized for generating original, poetic motivational quotes. Converted to ONNX with Q4 quantization for efficient in-browser inference via Transformers.js.
Try it live: mentria.ai/tools/quote
How it works
The model runs entirely in your browser using WebGPU (or WASM fallback). No server, no API keys, no data leaves your device. Weights are cached locally after the first download.
Usage with Transformers.js
import { AutoProcessor, AutoModelForImageTextToText, TextStreamer } from '@huggingface/transformers';
const MODEL_ID = 'mentriaai/Qwen3.5-0.8B-quotes-ONNX';
const processor = await AutoProcessor.from_pretrained(MODEL_ID);
const model = await AutoModelForImageTextToText.from_pretrained(MODEL_ID, {
dtype: { embed_tokens: 'q4', vision_encoder: 'fp16', decoder_model_merged: 'q4' },
device: 'webgpu',
});
const messages = [
{ role: 'system', content: 'You are a poetic thinker. Write one short, original motivational quote (1-2 sentences). Do not attribute it to anyone. No preamble, no quotation marks — just the quote itself.' },
{ role: 'user', content: [{ type: 'text', text: 'Give me a motivational quote.' }] },
];
const textContent = processor.apply_chat_template(messages, {
tokenize: false,
add_generation_prompt: true,
});
const inputs = await processor(textContent, null, { padding: true, truncation: true });
const streamer = new TextStreamer(processor.tokenizer, {
skip_prompt: true,
skip_special_tokens: true,
callback_function(token) {
process.stdout.write(token);
},
});
await model.generate({
...inputs,
max_new_tokens: 128,
do_sample: true,
temperature: 0.9,
top_p: 0.95,
streamer,
});
Example outputs
The seeds you plant in silence become the forests that speak for you.
Fear is a compass — it always points toward the thing worth doing.
You are both the sculptor and the marble — chip away everything that is not you.
The coastline of your dreams is where the sea meets the shore — follow it.
Model details
| Base model | Qwen/Qwen3.5-0.8B |
| Architecture | Hybrid Mamba (DeltaNet) + Transformer, 24 layers, 0.8B params |
| Fine-tuning | LoRA (rank 16, scale 2.0, all linear layers), 200 iterations |
| Training data | 229 curated motivational quotes in chat format |
| Training hardware | Apple M4 Pro 24GB via mlx-lm |
| ONNX source | Weight-swapped from onnx-community/Qwen3.5-0.8B-ONNX |
| Quantization | Q4 (4-bit weight-only, block size 32) |
| License | Apache 2.0 |
ONNX files
onnx/
decoder_model_merged_q4.onnx (813 KB graph + 452 MB weights)
embed_tokens_q4.onnx (857 B graph + 155 MB weights)
vision_encoder_fp16.onnx (183 KB graph + 195 MB weights)
Total download: ~802 MB on WebGPU (cached by the browser after first load).
Training details
- Learning rate 2e-4 (10x the standard full fine-tuning rate)
- All linear layers targeted (attention + MLP, not just attention)
- Prompt masking enabled (loss computed only on quote output tokens)
- 200 iterations selected from early stopping analysis to balance style learning vs. memorization
About mentria.ai
Mentria is a creative studio for tools, experiments, and visual transmissions. All tools run locally in your browser with zero server dependency. The motivational quote generator is one of several AI-powered tools available at mentria.ai/tools.
- Downloads last month
- 48