YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
LongCat-AudioDiT-1B-Diffusers
Diffusers format for Meituan's LongCat-AudioDiT-1B.
Model Description
A DiT (Diffusion Transformer) based audio generation model for text-to-audio synthesis.
Directory Structure
βββ model_index.json # Diffusers config file
βββ text_encoder/ # Text encoder (UMT5)
βββ tokenizer/ # Tokenizer (T5)
βββ transformer/ # Main DiT model
βββ vae/ # VAE encoder/decoder
Usage
from diffusers import LongCatAudioDiTPipeline
import torch
pipe = LongCatAudioDiTPipeline.from_pretrained(
"ruixiangma/LongCat-AudioDiT-1B-Diffusers",
torch_dtype=torch.bfloat16
)
audio = pipe(
prompt="A cheerful piano melody",
audio_duration_s=5.0,
num_inference_steps=50,
guidance_scale=4.0
).audio
Original Model
- HuggingFace: meituan-longcat/LongCat-AudioDiT-1B
- Downloads last month
- -
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support