TokForge Mobile Draft Models
Collection
Small MNN draft models and speculative-decoding bundles for TokForge on Android. Includes practical Qwen3 0.6B drafts plus experimental variants. • 5 items • Updated
Experimental Qwen3-0.6B draft bundle for TokForge + MNN, trained with LK Alpha on a larger 40K draft-training set.
This repo is for people exploring whether larger draft datasets materially improve mobile speculative decoding:
Qwen3-0.6B studentQwen3-8B teacher lane40K training setLK Alpha objectiveMNN bundleFinal logged training acceptance (alpha):
0.7314This bundle is meant for TokForge / MNN, not standard HF Inference.
Typical TokForge recipe:
{
"backend_type": "opencl",
"thread_num": 4,
"precision": "low",
"memory": "low",
"sampler_type": "greedy",
"speculative_type": "draftmodel",
"draft_predict_length": 3,
"draft_config_path": "/path/to/config_cpu.json"
}
This is currently best treated as a research / comparison artifact:
20K vs 40K20K drafts20K drafts.CPU2d=3Qwen3-8Bllm.mnnllm.mnn.weightllm_config.jsonconfig.jsonconfig_cpu.jsonIf you benchmark this on your own device, feel free to share results in Discord.