TokForge Mobile Draft Models
Collection
Small MNN draft models and speculative-decoding bundles for TokForge on Android. Includes practical Qwen3 0.6B drafts plus experimental variants. • 5 items • Updated
Experimental Qwen3-0.6B draft model for TokForge + MNN, trained as a more target-paired draft for Qwen3-14B style use.
Most mobile draft work ends up optimized around 8B. This repo exists for the opposite question:
what happens if a very small draft is trained more explicitly toward a
14Btarget lane?
Final logged training acceptance (alpha):
0.7236This bundle is meant for TokForge / MNN, not standard HF Inference.
Typical TokForge recipe:
{
"backend_type": "cpu",
"thread_num": 4,
"precision": "low",
"memory": "low",
"sampler_type": "greedy",
"speculative_type": "draftmodel",
"draft_predict_length": 3,
"draft_config_path": "/path/to/config_cpu.json"
}
This is a research / targeted pairing artifact:
14B-leaning experiments20K 8B draft lane14B-leaning mobile tests.20K baseline draft first.CPU2d=3Qwen3-14B in TokForgellm.mnnllm.mnn.weightllm_config.jsonconfig.jsonconfig_cpu.jsonIf you benchmark this on your own device, feel free to share results in Discord.