Woosh β€” Sound Effect Generative Models

Inference code and open weights for sound effect generative models developed at Sony AI.

GitHub ComfyUI
  Node arXiv

Screenshot 2026-04-12 013347

Models

Model Task Steps CFG Description
Woosh-Flow Text-to-Audio 50 4.5 Base model, best quality
Woosh-DFlow Text-to-Audio 4 1.0 Distilled Flow, fast generation
Woosh-VFlow Video-to-Audio 50 4.5 Base video-to-audio model
Woosh-DVFlow Video-to-Audio 4 1.0 Distilled VFlow, fast video-to-audio

Components

  • Woosh-AE β€” High-quality latent encoder/decoder. Provides latents for generative modeling and decodes audio from generated latents.
  • Woosh-CLAP (TextConditionerA/V) β€” Multimodal text-audio alignment model. Provides token latents for diffusion model conditioning. TextConditionerA for T2A, TextConditionerV for V2A.
  • Woosh-Flow / Woosh-DFlow β€” Original and distilled LDMs for text-to-audio generation.
  • Woosh-VFlow β€” Multimodal LDM generating audio from video with optional text prompts.

ComfyUI Nodes

Use these models in ComfyUI with ComfyUI-Woosh:

# Via ComfyUI Manager β€” search "Woosh" and click Install
# Or manually:
cd ComfyUI/custom_nodes
git clone https://github.com/Saganaki22/ComfyUI-Woosh.git
pip install -r ComfyUI-Woosh/requirements.txt

Place downloaded model folders in ComfyUI/models/woosh/. See the ComfyUI-Woosh README for full setup and workflow examples.

Note: Set the Woosh TextConditioning node to T2A for Flow/DFlow models and V2A for VFlow/DVFlow models.

Inference

See the official Woosh repository for standalone inference code and training details.

VRAM Requirements

Model VRAM (Approx)
Flow / VFlow ~8-12 GB
DFlow / DVFlow ~4-6 GB
With CPU offload ~2-4 GB

Citation

@article{saghibakshi2025woosh,
      title={Woosh: Enhancing Text-to-Audio Generation with Flow Matching and FlowMap Distillation},
      author={Saghibakshi, Ali and Bakshi, Soroosh and Tagliasacchi, Antonio and Wang, Shaojie and Choi, Jongmin and
Kawakami, Kazuhiro and Gu, Yuxuan},
      journal={arXiv preprint arXiv:2502.07359},
      year={2025}
}

License

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for drbaph/Woosh