Perfomance question

by Geximus - opened 18 days ago

Hello! I use MiniMax-M2.5-AWQ-4bit (cyankiwi) with Aurora-Spec-Minimax-M2.5 draft model, but I'm experiencing very low speculative decoding performance:

Spec Accept Rate: ~34-35%
Avg Accept Length: ~1.6 tokens
Expected: ~2.62 tokens (per model card)

My setup:
SGLang: dev (latest)
Model: MiniMax-M2.5-AWQ-4bit (200K vocab)
Draft: Aurora-Spec-Minimax-M2.5 (32K draft vocab)
Algorithm: EAGLE3
speculative-num-steps: 4
speculative-eagle-topk: 1
speculative-num-draft-tokens: 6
Draft attention: flashinfer
TP/EP: 8
dtype: bfloat16
kv-cache-dtype: fp8_e4m3

Suspected root cause — Vocab Size Mismatch:
The Aurora draft model has draft_vocab_size: 32000 (32K tokens), while the target MiniMax model has vocab_size: 200064
(200K tokens). This means:

Draft can only predict tokens from its 32K vocabulary
Any token generated by target that's NOT in draft's vocab → auto-rejected
With Cyrilic text (which uses many rare tokens), most tokens fall outside 32K → low acceptance pehaps?

Questions:

Is there a version of Aurora draft trained on full 200K vocabulary?
Or a way to map tokens between vocabularies?
Any other parameters that could improve acceptance?

Thanks for any help!

xiaoxiawu123

Together org 17 days ago

This speculator is mainly released as a demo for the paper, where the key idea is that it adapts to online traffic rather than relying purely on a fixed offline match.

You can use the Aurora codebase to run serving and online training together so the draft model gradually adapts to your workload:
https://github.com/togethercomputer/aurora

So while vocabulary mismatch may contribute to lower initial acceptance, the intended usage is to let the speculator continue training on your real traffic and improve there, rather than expecting the demo checkpoint to be fully optimized out of the box for every deployment.

Geximus

17 days ago

Thanks! this sounds intresting, will do

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment