throughput does not seem to be as good as Eagle3?
#1
by Jing17 - opened
It might be because the verification stage of DFlash consumes too much unnecessary compute. You could try using a better GPU or reducing the number of tokens in the verification stage.
By the way, what concurrency level did you use for the evaluation?
I used H20 * 4, with concurrency=8
