Mistral-Medium-3.5-128B-BF16-Text-Only

📜 Technical Architecture Note

This model has been converted from Mistral3ForConditionalGeneration (Multimodal) to MistralForCausalLM (Standard Text-Only). This change ensures maximum compatibility with standard fine-tuning libraries like Axolotl, Unsloth, and Hugging Face Transformers without requiring custom vision-encoder handling.

Help me feed the data beast! Taking commissions for universe-specific models.

Support on Ko-fi

Model Description

This is a processed version of Mistral-Medium-3.5-128B designed for users who prioritize text-only performance and ease of fine-tuning.

Modification Details:

  • Precision Upscale: Converted from FP8 weights to BF16 to restore full 16-bit brain-float precision for stable gradient updates during training.
  • Vision Layer Stripping: All vision encoders and multimodal projection layers have been removed, significantly reducing memory overhead during inference and training for text-only tasks.
  • Architecture Re-mapping: The configuration has been modified to use MistralForCausalLM, allowing it to be treated as a standard dense language model.

Purpose & Usage

This model is intended to serve as a clean base for fine-tuning. By removing the vision components, you can allocate more VRAM to sequence length or batch size. It is 100% functional for text-only chat and reasoning out of the box.

Acknowledgements

  • Credit to Mistral AI for the original Mistral-Medium-3.5-128B architecture.
Downloads last month
496
Safetensors
Model size
125B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Darkhn/Mistral-Medium-3.5-128B-BF16-Text-Only

Finetuned
(7)
this model