Gave it a try, here are my findings

by ktsaou - opened Dec 25, 2025

Dec 25, 2025

@Firworks I gave this a try and I (with the help of Claude Code) managed to load and run the model. But it is not reliable.

Issue Report: Firworks/Ministral-3-14B-Reasoning-2512-nvfp4 Compatibility Issues

Environment

vLLM Version: 0.14.0rc1.dev127+g2d6001f49
PyTorch Version: 2.9.1+cu130
Transformers Version: 4.57.3

Summary

The Firworks/Ministral-3-14B-Reasoning-2512-nvfp4 model fails to load on vLLM and requires multiple config workarounds. Even after applying fixes, the model produces degenerate output on longer generations.

Problem 1: Unknown model_type ministral3

Error:
KeyError: 'ministral3'

Cause: text_config.model_type is set to ministral3, which doesn't exist in transformers' CONFIG_MAPPING. The valid types are ministral or mistral.

Workaround: Changed text_config.model_type from ministral3 to mistral in config.json.

Problem 2: Missing architectures in text_config

Error:
TypeError: 'NoneType' object is not iterable

Cause: text_config lacks an architectures field, causing vLLM to fail when resolving the language model class.

Workaround: Added "architectures": ["MistralForCausalLM"] to text_config.

Problem 3: Missing preprocessor/processor config files

Error:
OSError: Can't load image processor... preprocessor_config.json

Cause: The quantized model is missing preprocessor_config.json and processor_config.json required for the Pixtral vision encoder.

Workaround: Copied these files from the original mistralai/Ministral-3-14B-Reasoning-2512 model.

Problem 4: Missing tokenizer files

Error:
AssertionError: Expected to decode 1 token, got 3

Cause: The model is missing proper tokenizer files (tokenizer.json, tokenizer_config.json, special_tokens_map.json) with the [IMG], [THINK], [/THINK] special tokens properly defined.

Workaround: Downloaded tokenizer files from the original model.

Problem 5: MistralTokenizer incompatibility with HuggingFace tokenizer mode

Error:
AttributeError: 'MistralTokenizer' object has no attribute 'convert_tokens_to_ids'

Cause: When using native Mistral tokenizer mode, it lacks methods expected by vLLM's processing pipeline.

Workaround: Used --tokenizer-mode hf to force HuggingFace tokenizer.

Problem 6: Reasoning parser requires MistralTokenizer

Error:
ValueError: The tokenizer must be an instance of MistralTokenizer.

Cause: --reasoning-parser mistral requires the native MistralTokenizer, but we're forced to use HF tokenizer (see Problem 5).

Workaround: Created a custom ministral reasoning parser that uses [THINK]/[/THINK] tokens but works with HF tokenizer.

Problem 7: Missing chat_template

Error:
ChatTemplateResolutionError: default chat template is no longer allowed

Cause: Neither the quantized model nor the tokenizer_config.json includes a chat_template.

Workaround: Downloaded chat_template.jinja from the original model and added --chat-template flag.

Problem 8: Tokenizer regex warning

Warning:
The tokenizer you are loading with an incorrect regex pattern...
You should set fix_mistral_regex=True

Cause: Known issue with Mistral tokenizer regex patterns for contraction handling.

Status: Warning only; tokenizer.json appears to have correct pattern but warning persists.

Problem 9: Degenerate output on longer generations

Symptom: Model produces coherent output for short responses but degenerates into:

Random Unicode characters from multiple languages (漢, Bาง, пре历, دياس, Україна, რომელიც)
Nonsense word fragments mixed with English
Extreme repetition loops ("harry harry harry..." or "above it, above it, above it...")

Cause (suspected):

Architecture mismatch: Forcing MistralForCausalLM instead of native MinistralForCausalLM (which vLLM doesn't support yet)
RoPE configuration: The original uses rope_parameters with Ministral-specific fields; we converted to standard rope_scaling format, potentially losing important parameters
Possible quantization corruption

Status: UNRESOLVED. Model is unusable for production.

Files Missing from Quantized Model

The following files present in the original model are missing from the Firworks quantization:

chat_template.jinja
preprocessor_config.json
processor_config.json
params.json (required for --config_format mistral --load_format mistral)

Recommended Fixes

Fix text_config.model_type: Change from ministral3 to ministral (or mistral if targeting compatibility)
Add architectures to text_config: Include "architectures": ["MinistralForCausalLM"] or appropriate class
Include all required files: Bundle chat_template.jinja, preprocessor_config.json, processor_config.json, and tokenizer files
Consider including params.json: This would allow native Mistral format loading which might work better
Validate quantization quality: The degenerate output suggests possible weight corruption or incompatibility in the nvfp4 quantization process
Test with vLLM: Verify the model loads and generates coherently on vLLM before publishing

Config Changes Applied (for reference)

text_config modifications:
{
"model_type": "mistral", // changed from "ministral3"
"architectures": ["MistralForCausalLM"], // added
"rope_theta": 1000000000.0,
"rope_scaling": { // converted from "rope_parameters"
"type": "yarn",
"factor": 16.0,
"original_max_position_embeddings": 16384,
"beta_fast": 32.0,
"beta_slow": 1.0,
"mscale": 1.0,
"mscale_all_dim": 1.0
}
}

vLLM launch flags:
--tokenizer-mode hf
--reasoning-parser ministral \ # custom parser
--chat-template /path/to/chat_template.jinja
--trust-remote-code

Conclusion

The Firworks/Ministral-3-14B-Reasoning-2512-nvfp4 model requires significant patching to load on vLLM, and even after all workarounds, produces unusable output for longer generations. The quantization appears to be incompatible with current vLLM/transformers infrastructure and may have quality issues in the quantized weights themselves.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment