NSFW_MMaudio

Not-For-All-Audiences

MMAudio NSFW - FP16 Optimized

This repository contains an FP16 safetensors version of the fine-tuned MMAudio model from cloud19/NSFW_MMaudio, optimized for improved memory efficiency and faster loading times.

Base Model: cloud19/NSFW_MMaudio
Original Project: hkchengrex/MMAudio

Model Details

Base Architecture: large_44k (from the original MMAudio)
Fine-tuning: Fine-tuned on NSFW content (see base model for details)
Optimization: Converted from FP32 PyTorch checkpoint to FP16 safetensors
Capabilities: Video-to-Audio, Image-to-Audio, Text-to-Audio
Format: Safetensors (.safetensors)
Precision: 16-bit floating point

Improvements Over Base Model

✅ ~50% smaller file size (FP32 → FP16 conversion)
✅ Faster loading with safetensors format
✅ Lower GPU memory usage during inference
✅ Same quality output (minimal precision loss with FP16)
✅ Better compatibility with modern ML frameworks

How to Use

This model can be used as a drop-in replacement for the original model. Load the safetensors file instead of the original PyTorch checkpoint:

from safetensors.torch import load_file

# Load the FP16 model weights
model_weights = load_file("model_fp16.safetensors")

# Load into your MMAudio model architecture
# (follow the same usage pattern as the base model)

System Requirements:

GPU: 8-12 GB VRAM (reduced from 12-16 GB due to FP16 optimization)
Python 3.10+
PyTorch with CUDA support

Installation

For usage instructions, please refer to the base model repository and simply replace the model loading with the FP16 safetensors version.

Technical Details

Original Format: FP32 PyTorch (.pth) - ~2.5GB
Optimized Format: FP16 Safetensors (.safetensors) - ~1.25GB
Conversion Method: Direct FP32 → FP16 tensor conversion
Quality Impact: Negligible quality loss in practice

Limitations

Same limitations as the base model apply
Content Warning: Due to the NSFW nature of the fine-tuning dataset, the model may generate explicit or mature audio content. User discretion is advised.
FP16 precision may introduce minimal numerical differences compared to FP32

Credits & Citation

Base Model: cloud19/NSFW_MMaudio
Original MMAudio: hkchengrex/MMAudio
Optimization: FP16 conversion for improved efficiency

All credit for the original architecture, fine-tuning, and model development goes to the respective authors. This repository only provides format optimization.

@inproceedings{cheng2025taming,
  title={{MMAudio}: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis},
  author={Cheng, Ho Kei and Ishii, Masato and Hayakawa, Akio and Shibuya, Takashi and Schwing, Alexander and Mitsufuji, Yuki},
  booktitle={CVPR},
  year={2025}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for king5699/NSFW_MMaudio

Base model

hkchengrex/MMAudio

Finetuned

cloud19/NSFW_MMaudio

Finetuned

(3)

this model