One flaw in the architecture - has to reprocess the prompt every time
#38
by Dampfinchen - opened
Hello Qwen team,
while using your model extensively, I have noticed the model has RNN qualitites. This means that once the context size is exceeded, it will reprocess the prompt every enquiry, making it very slow to respond. Please rethink this approach. Some people don't like having to start new chats often, they rather rely on the UI truncating the top of the prompt, which is a big issue for recurrent neural networks.
Yes, it is more memory efficient so you can set a higher context, but even that can fill up fast and still needs powerful hardware.
Other than that, the architecture is very good.
Best regards
Dampfinchen changed discussion title from One fatal flaw in the architecture - has to reprocess the prompt every time to One flaw in the architecture - has to reprocess the prompt every time