Qwen3-0.6B-lk-alpha-40k-MNN

Experimental Qwen3-0.6B draft bundle for TokForge + MNN, trained with LK Alpha on a larger 40K draft-training set.

Why this repo exists

This repo is for people exploring whether larger draft datasets materially improve mobile speculative decoding:

  • Qwen3-0.6B student
  • Qwen3-8B teacher lane
  • 40K training set
  • LK Alpha objective
  • exported as a mobile-ready MNN bundle

Training snapshot

Final logged training acceptance (alpha):

  • 0.7314

Usage

This bundle is meant for TokForge / MNN, not standard HF Inference.

Typical TokForge recipe:

{
  "backend_type": "opencl",
  "thread_num": 4,
  "precision": "low",
  "memory": "low",
  "sampler_type": "greedy",
  "speculative_type": "draftmodel",
  "draft_predict_length": 3,
  "draft_config_path": "/path/to/config_cpu.json"
}

Status

This is currently best treated as a research / comparison artifact:

  • useful if you want to compare 20K vs 40K
  • not yet the clearest device-side winner over the simpler 20K drafts

Limitations and Intended Use

  • This is a research comparison artifact first.
  • We do not currently have stronger preserved device evidence for this variant than for the simpler 20K drafts.
  • Mobile performance still depends more on target pairing and backend routing than on a small training-objective delta alone.

Collection

Best-known use

  • Draft model backend: CPU
  • Draft threads: 2
  • Draft predict length: d=3
  • Typical target pairing: Qwen3-8B

Included files

  • llm.mnn
  • llm.mnn.weight
  • llm_config.json
  • config.json
  • config_cpu.json
  • tokenizer files
  • ONNX export artifact for reference

TokForge

If you benchmark this on your own device, feel free to share results in Discord.

Downloads last month
148
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for darkmaniac7/Qwen3-0.6B-lk-alpha-40k-MNN

Finetuned
Qwen/Qwen3-0.6B
Quantized
(289)
this model

Collection including darkmaniac7/Qwen3-0.6B-lk-alpha-40k-MNN