YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

LongCat-AudioDiT-1B-Diffusers

Diffusers format for Meituan's LongCat-AudioDiT-1B.

Model Description

A DiT (Diffusion Transformer) based audio generation model for text-to-audio synthesis.

Directory Structure

β”œβ”€β”€ model_index.json      # Diffusers config file
β”œβ”€β”€ text_encoder/         # Text encoder (UMT5)
β”œβ”€β”€ tokenizer/            # Tokenizer (T5)
β”œβ”€β”€ transformer/          # Main DiT model
└── vae/                  # VAE encoder/decoder

Usage

from diffusers import LongCatAudioDiTPipeline
import torch

pipe = LongCatAudioDiTPipeline.from_pretrained(
    "ruixiangma/LongCat-AudioDiT-1B-Diffusers",
    torch_dtype=torch.bfloat16
)

audio = pipe(
    prompt="A cheerful piano melody",
    audio_duration_s=5.0,
    num_inference_steps=50,
    guidance_scale=4.0
).audio

Original Model

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support