Testing REAM on Kimi-Linear and Nemtron's hybrid attention models

#2
by TomLucidor - opened

REAM seems to be a promising alternative to the claims of the "lobotomized" REAP method, which is an interesting thought on if it is size-sensitive.

Samsung AI Lab (SAIL) Montreal org

Can you elaborate on "lobotomized"? Does it make models very bad on some tasks?
We are considering releasing the REAM code so that the community can run it on Kimi and other architectures.

"Lobotomized" implying tasks of the same domain as the calibration set used for REAP, BUT outside of the explicit topics encapsulated by the calibration dataset, CAN lose performance. This is reported by a lot of users of REAP models from Cerebras, making them seek Q3/Q2 quantized models instead for equivalent memory usage.

Sign up or log in to comment