Support for finetuning and self-hosted inference?
#23
by anttip - opened
Could the Mistral team put more support into finetunes and self-hosting of these models?
The instruct finetunes are only released in FP8, so these get dequantization noise to start with. The support from finetuning toolkits is limited. Unsloth finetunes the model, but support for writing a HF-compatible model seems to have broken down.
Inference with vLLM and SGLang is patchy. A version of SGLang supporting the model was merged, but this has broken down with updates to Transformers.
Quality-wise the models are good, among the best in their sizes.