About token generation speed
#1
by budivoy - opened
Summary
I am testing "fla-version" RWKV-7-2.9B-g1 for solving tasks from GAIA benchmark.
Weird thing noticed. Generation speed is quite slow for fla-version on NVidia T4 comparing to my smartphone...
| Model | Device | Decode, token / s |
|---|---|---|
| fla (RWKV-7-2.9B-g1) | Kaggle, NVidia T4 x2 | 8.33 |
| official demo (rwkv7-g1a-2.9b-20250924-ctx4096) | NVidia T4 | 26.90 |
| android (RWKV-7-G1a 2.9 250924, W4A16) | Snapdragon 8 Elite, NPU | 27.69 |
The strange thing is that official demo runs same faster on single T4...
Question
- Is it normal speed?
- Is there a way to speed-up?
Code for reference
https://github.com/budivoy/Hugging-Face-Courses/blob/devel/agents-course/rwkv-fla-hub.ipynb
PS. I appreciate your work, using RWKV via transformers API is quite convenient.