Qwen3-0.6B-lk-alpha-14b-paired-MNN

Experimental Qwen3-0.6B draft model for TokForge + MNN, trained as a more target-paired draft for Qwen3-14B style use.

Why this repo exists

Most mobile draft work ends up optimized around 8B. This repo exists for the opposite question:

what happens if a very small draft is trained more explicitly toward a 14B target lane?

Training snapshot

Final logged training acceptance (alpha):

  • 0.7236

Usage

This bundle is meant for TokForge / MNN, not standard HF Inference.

Typical TokForge recipe:

{
  "backend_type": "cpu",
  "thread_num": 4,
  "precision": "low",
  "memory": "low",
  "sampler_type": "greedy",
  "speculative_type": "draftmodel",
  "draft_predict_length": 3,
  "draft_config_path": "/path/to/config_cpu.json"
}

Status

This is a research / targeted pairing artifact:

  • intended for 14B-leaning experiments
  • more specialized than the general 20K 8B draft lane
  • not yet the default recommendation over the simpler general-purpose drafts

Limitations and Intended Use

  • This is a target-paired experiment, not the current default draft recommendation.
  • Best treated as a research option for 14B-leaning mobile tests.
  • General-purpose users should usually start with the 20K baseline draft first.

Collection

Best-known use

  • Draft model backend: CPU
  • Draft threads: 2
  • Draft predict length: d=3
  • Intended target pairing: Qwen3-14B in TokForge

Included files

  • llm.mnn
  • llm.mnn.weight
  • llm_config.json
  • config.json
  • config_cpu.json
  • tokenizer files
  • ONNX export artifact for reference

TokForge

If you benchmark this on your own device, feel free to share results in Discord.

Downloads last month
155
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for darkmaniac7/Qwen3-0.6B-lk-alpha-14b-paired-MNN

Finetuned
Qwen/Qwen3-0.6B
Quantized
(289)
this model

Collection including darkmaniac7/Qwen3-0.6B-lk-alpha-14b-paired-MNN