Testing REAM on Kimi-Linear and Nemtron's hybrid attention models

by TomLucidor - opened Feb 12

Feb 12

REAM seems to be a promising alternative to the claims of the "lobotomized" REAP method, which is an interesting thought on if it is size-sensitive.

bknyaz

Samsung AI Lab (SAIL) Montreal org Feb 12

Can you elaborate on "lobotomized"? Does it make models very bad on some tasks?
We are considering releasing the REAM code so that the community can run it on Kimi and other architectures.

TomLucidor

Feb 12

"Lobotomized" implying tasks of the same domain as the calibration set used for REAP, BUT outside of the explicit topics encapsulated by the calibration dataset, CAN lose performance. This is reported by a lot of users of REAP models from Cerebras, making them seek Q3/Q2 quantized models instead for equivalent memory usage.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment