Mistral-Medium-3.5-128B-BF16-Text-Only
📜 Technical Architecture Note
This model has been converted from Mistral3ForConditionalGeneration (Multimodal) to MistralForCausalLM (Standard Text-Only). This change ensures maximum compatibility with standard fine-tuning libraries like Axolotl, Unsloth, and Hugging Face Transformers without requiring custom vision-encoder handling.
Help me feed the data beast! Taking commissions for universe-specific models.
Support on Ko-fiModel Description
This is a processed version of Mistral-Medium-3.5-128B designed for users who prioritize text-only performance and ease of fine-tuning.
Modification Details:
- Precision Upscale: Converted from FP8 weights to BF16 to restore full 16-bit brain-float precision for stable gradient updates during training.
- Vision Layer Stripping: All vision encoders and multimodal projection layers have been removed, significantly reducing memory overhead during inference and training for text-only tasks.
- Architecture Re-mapping: The configuration has been modified to use
MistralForCausalLM, allowing it to be treated as a standard dense language model.
Purpose & Usage
This model is intended to serve as a clean base for fine-tuning. By removing the vision components, you can allocate more VRAM to sequence length or batch size. It is 100% functional for text-only chat and reasoning out of the box.
Acknowledgements
- Credit to Mistral AI for the original Mistral-Medium-3.5-128B architecture.
- Downloads last month
- 496
Model tree for Darkhn/Mistral-Medium-3.5-128B-BF16-Text-Only
Base model
mistralai/Mistral-Medium-3.5-128B