Why is there a 4k ctx limit?

#7
by crazyi - opened

I am using omlx, but the maximum size of the ctx is only 4096. I don’t know why there is this limitation. Is it due to omlx or dflash?
https://github.com/jundot/omlx/blob/main/docs/experimental/dflash_mlx_integration.md

It shouldn't. The latest DFlash draft model with SWA layers has been able to support much longer context length ( In my test it still works with 100K context length when using Claude Code). So this is omlx issue, they should remove this limit.

You can start the regular oMLX application in a terminal with whatever context that you want for DFlash by using its environment variable. For example, for contexts up to 8192, use the following: DFLASH_MAX_CTX=8193 /Applications/oMLX.app/Contents/MacOS/omlx-cli serve

I think they're soon adding an option for the DFlash context in the UI.

Sign up or log in to comment