supported eagle context size?
#5
by Jannik2099 - opened
Thanks for the model, it's a superb step up from Mistral 3 all around.
On the model card, you recommend serving speculative decoding with
--speculative_config {
...
"max_model_len": "16384"
}
Does the eagle head only support 16k context, or was it trained for 256k context like the base model, and this is merely the recommended config because you are seeing diminishing returns above 16k?
Do you recommend serving with 256k eagle context if VRAM allows it?
Hey, you can actually remove it (i did it in the model card) the eagle head should work properly :)
Edit: actually got confused by some remapping, I added it back while I investigate a bit more.