throughput does not seem to be as good as Eagle3？

by Jing17 - opened Mar 6

Mar 6

Hello, I tried to train the Qwen3-32B-Eagle3 model using Eagle Chat training data and tested gsm8k with H20 + sglang. The acceptance rate is higher than Eagle3 of the 3-1-4 strategy, but the throughput does not seem to be as good as Eagle3?

gqzs

Mar 10

It might be because the verification stage of DFlash consumes too much unnecessary compute. You could try using a better GPU or reducing the number of tokens in the verification stage.
By the way, what concurrency level did you use for the evaluation?

Jing17

Mar 11

I used H20 * 4, with concurrency=8

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment