📜 Technical Architecture Note

This model has been converted from Mistral3ForConditionalGeneration (Multimodal) to MistralForCausalLM (Standard Text-Only). This change ensures maximum compatibility with standard fine-tuning libraries like Axolotl, Unsloth, and Hugging Face Transformers without requiring custom vision-encoder handling.

Help me feed the data beast! Taking commissions for universe-specific models.

Support on Ko-fi

Model Description

This is a processed version of Mistral-Medium-3.5-128B designed for users who prioritize text-only performance and ease of fine-tuning.

Modification Details:

Precision Upscale: Converted from FP8 weights to BF16 to restore full 16-bit brain-float precision for stable gradient updates during training.
Vision Layer Stripping: All vision encoders and multimodal projection layers have been removed, significantly reducing memory overhead during inference and training for text-only tasks.
Architecture Re-mapping: The configuration has been modified to use MistralForCausalLM, allowing it to be treated as a standard dense language model.

Purpose & Usage

This model is intended to serve as a clean base for fine-tuning. By removing the vision components, you can allocate more VRAM to sequence length or batch size. It is 100% functional for text-only chat and reasoning out of the box.

Acknowledgements

Credit to Mistral AI for the original Mistral-Medium-3.5-128B architecture.

Downloads last month: 496

Safetensors

Model size

125B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Darkhn/Mistral-Medium-3.5-128B-BF16-Text-Only

Base model

mistralai/Mistral-Medium-3.5-128B

Finetuned

(7)

this model