Support for finetuning and self-hosted inference?

#23

by anttip - opened Mar 2

Discussion

anttip

Mar 2

•

edited Mar 2

Could the Mistral team put more support into finetunes and self-hosting of these models?

The instruct finetunes are only released in FP8, so these get dequantization noise to start with. The support from finetuning toolkits is limited. Unsloth finetunes the model, but support for writing a HF-compatible model seems to have broken down.

Inference with vLLM and SGLang is patchy. A version of SGLang supporting the model was merged, but this has broken down with updates to Transformers.

Quality-wise the models are good, among the best in their sizes.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment