Wrong context length in context?

#14
by wimmmm - opened

Hi,

I was trying to run some benchmarks on llama.cpp, but llama-bench crashed claiming it was unable to create a context.

Running GGUF quantizations with llama-server works, but the reported context is 1M tokens.

At first I thought that this must be an error in the earlier GGUF conversions, but nope, it's in the config for the safetensors as well:

    "max_position_embeddings": 1048576,

The main model card claims 256k context length (see readme)

Which of these is correct?

Mistral AI_ org
edited Mar 17

the model could go up to theoritically 1M but we strongly suggest to keep it to 256k as behavior above is not what we targeted and very likely subject to bad performance.

Sign up or log in to comment