TokForge Mobile Draft Models
Collection
Small MNN draft models and speculative-decoding bundles for TokForge on Android. Includes practical Qwen3 0.6B drafts plus experimental variants. • 5 items • Updated
Small Qwen3-0.6B draft model exported for TokForge + MNN speculative decoding on Android.
This is a practical, mobile-oriented draft bundle rather than a standard Transformers checkpoint. It is intended to be paired with larger Qwen3 targets inside TokForge.
This model is one of the strongest lightweight draft candidates we have for mobile speculative decoding:
20K teacher samplesQwen3-8B teacherMNN bundleCPU draft + GPU/CPU target TokForge flowCPU2d=3Qwen3-8B in TokForgeOn RedMagic SM8850 with Qwen3-8B target:
13.9 tok/s18.1 tok/s+30%Training acceptance (alpha) at the final logged epoch:
0.7178llm.mnnllm.mnn.weightllm_config.jsonconfig.jsonconfig_cpu.jsonThis bundle is meant for TokForge / MNN, not standard HF Inference.
Typical TokForge recipe:
{
"backend_type": "opencl",
"thread_num": 4,
"precision": "low",
"memory": "low",
"sampler_type": "greedy",
"speculative_type": "draftmodel",
"draft_predict_length": 3,
"draft_config_path": "/path/to/config_cpu.json"
}
Known-good draft-side config:
{
"backend_type": "cpu",
"thread_num": 2,
"precision": "low",
"memory": "low",
"sampler_type": "greedy"
}
Qwen3 targets inside TokForge.Qwen3-8B; smaller or differently-paired targets may behave differently.If you benchmark this on your own device, feel free to share results in Discord.