DiscordLM 0.5B — MLC Quantized (q4f16_1)

WebLLM/MLC-ready quantized version of DiscordLM-0.5B.

4-bit quantized (q4f16_1) for efficient browser-based inference via WebGPU. ~280MB download, ~400MB VRAM.

Usage with WebLLM

import * as webllm from "@mlc-ai/web-llm";

const engine = new webllm.MLCEngine();
await engine.reload("eshonindex/DiscordLM-0.5B-q4f16_1-MLC");

const reply = await engine.chat.completions.create({
  messages: [{ role: "user", content: "How do Discord permissions work?" }],
});
console.log(reply.choices[0].message.content);

Model Details

Base model: eshonindex/DiscordLM-0.5B
Quantization: q4f16_1 (4-bit weights, f16 compute)
Format: MLC/WebLLM compatible
Size: ~280MB (8 shards)
VRAM: ~400MB
Context window: 4096 tokens

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for eshonindex/DiscordLM-0.5B-q4f16_1-MLC

Base model

Qwen/Qwen2.5-0.5B

Finetuned

Qwen/Qwen2.5-0.5B-Instruct

Adapter

eshonindex/DiscordLM-0.5B

Finetuned

(1)

this model