Muse-12B-NVFP4-FP8
Quantized weights of the Muse-12B model for use with nVidia Blackwell GPUs, in a hybrid format using NVFP4 with Four Over Six adaptive block scaling for the MLP layers and FP8_DYNAMIC for the self-attention layers. More information about the hybrid format here, but the short version is that FP8 attention has minimal impact on speed and VRAM usage while making a marked difference in output quality, especially at longer context lengths.
Inference
Tested on a RTX 5060 Ti 16GB with Aphrodite Engine and vLLM. It requires compressed-tensors 0.14.0 or later, so you'll have to update the version in your venv if you use Aphrodite Engine or an older version of vLLM. On my system, Aphrodite Engine was able to run the checkpoint with a 32k context window with the --single-user-mode flag, while vLLM didn't have quite enough VRAM to do the same. It works fine at shorter context lengths or with the KV cache quantized, however.
Recommended generation settings (a mix of what it says on the Muse-12B model card and the AI Dungeon Model Guide):
- Temperature: 1.0
- Top K: 250
- Top P: 1
- Min P: 0.025
- Repetition Penalty: 1.05
- Presence Penalty: 0.25
If using programs that support DRY and XTC (at time of writing, Aphrodite Engine supports both and vLLM doesn't support either yet), you can also try using them to cut down on repetition if necessary.
Prompt Format
The calibration data was provided with the same ChatML tags as had been used to finetune Latitude's 12B models:
<|im_start|>system
You're a masterful storyteller and gamemaster. Write in second person present tense (You are), crafting vivid, engaging narratives with authority and confidence.<|im_end|>
<|im_start|>user
> You peer into the darkness.<|im_end|>
<|im_start|>assistant
You have been eaten by a grue.<|im_end|>
As such, I would recommend using that format for inference.
Credits
Muse-12B was made by Latitude Games with help from Gryphe Padar
Four Over Six was discovered by Jack Cook, Junxian Guo, Guangxuan Xiao, Yujun Lin, and Song Han
- Downloads last month
- 29
Model tree for DataSnake/Muse-12B-NVFP4-FP8
Base model
mistralai/Mistral-Nemo-Base-2407