DiscordLM 0.5B โ€” MLC Quantized (q4f16_1)

WebLLM/MLC-ready quantized version of DiscordLM-0.5B.

4-bit quantized (q4f16_1) for efficient browser-based inference via WebGPU. ~280MB download, ~400MB VRAM.

Usage with WebLLM

import * as webllm from "@mlc-ai/web-llm";

const engine = new webllm.MLCEngine();
await engine.reload("eshonindex/DiscordLM-0.5B-q4f16_1-MLC");

const reply = await engine.chat.completions.create({
  messages: [{ role: "user", content: "How do Discord permissions work?" }],
});
console.log(reply.choices[0].message.content);

Model Details

  • Base model: eshonindex/DiscordLM-0.5B
  • Quantization: q4f16_1 (4-bit weights, f16 compute)
  • Format: MLC/WebLLM compatible
  • Size: ~280MB (8 shards)
  • VRAM: ~400MB
  • Context window: 4096 tokens
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for eshonindex/DiscordLM-0.5B-q4f16_1-MLC

Finetuned
(1)
this model