DiscordLM 0.5B โ MLC Quantized (q4f16_1)
WebLLM/MLC-ready quantized version of DiscordLM-0.5B.
4-bit quantized (q4f16_1) for efficient browser-based inference via WebGPU. ~280MB download, ~400MB VRAM.
Usage with WebLLM
import * as webllm from "@mlc-ai/web-llm";
const engine = new webllm.MLCEngine();
await engine.reload("eshonindex/DiscordLM-0.5B-q4f16_1-MLC");
const reply = await engine.chat.completions.create({
messages: [{ role: "user", content: "How do Discord permissions work?" }],
});
console.log(reply.choices[0].message.content);
Model Details
- Base model: eshonindex/DiscordLM-0.5B
- Quantization: q4f16_1 (4-bit weights, f16 compute)
- Format: MLC/WebLLM compatible
- Size: ~280MB (8 shards)
- VRAM: ~400MB
- Context window: 4096 tokens