Wrong context length in context?

#14

by wimmmm - opened Mar 17

Discussion

wimmmm

Mar 17

Hi,

I was trying to run some benchmarks on llama.cpp, but llama-bench crashed claiming it was unable to create a context.

Running GGUF quantizations with llama-server works, but the reported context is 1M tokens.

At first I thought that this must be an error in the earlier GGUF conversions, but nope, it's in the config for the safetensors as well:

    "max_position_embeddings": 1048576,

The main model card claims 256k context length (see readme)

Which of these is correct?

juliendenize

Mistral AI_ org Mar 17

•

edited Mar 17

the model could go up to theoritically 1M but we strongly suggest to keep it to 256k as behavior above is not what we targeted and very likely subject to bad performance.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment