throughput does not seem to be as good as Eagle3?

#1
by Jing17 - opened

image
Hello, I tried to train the Qwen3-32B-Eagle3 model using Eagle Chat training data and tested gsm8k with H20 + sglang. The acceptance rate is higher than Eagle3 of the 3-1-4 strategy, but the throughput does not seem to be as good as Eagle3?

It might be because the verification stage of DFlash consumes too much unnecessary compute. You could try using a better GPU or reducing the number of tokens in the verification stage.
By the way, what concurrency level did you use for the evaluation?

I used H20 * 4, with concurrency=8

Sign up or log in to comment