About token generation speed

by budivoy - opened Oct 21, 2025

Summary

I am testing "fla-version" RWKV-7-2.9B-g1 for solving tasks from GAIA benchmark.

Weird thing noticed. Generation speed is quite slow for fla-version on NVidia T4 comparing to my smartphone...

The strange thing is that official demo runs same faster on single T4...

PS. I appreciate your work, using RWKV via transformers API is quite convenient.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment