Prefill speed degradation

by daibuzizai - opened Dec 9, 2025

Dec 9, 2025

After testing, ik_llama.cpp shows low running efficiency, with prefill speed seriously degraded to only 25% of the original.

AesSedai

Owner Dec 10, 2025

•

edited Dec 10, 2025

That's because most of the quants keep everything but the conditional experts in Q8. So things like attention are a bit heavier, but should degrade less over longer contexts. That's the theory, at least.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment