Woosh — Sound Effect Generative Models

Inference code and open weights for sound effect generative models developed at Sony AI.

Models

Model	Task	Steps	CFG	Description
Woosh-Flow	Text-to-Audio	50	4.5	Base model, best quality
Woosh-DFlow	Text-to-Audio	4	1.0	Distilled Flow, fast generation
Woosh-VFlow	Video-to-Audio	50	4.5	Base video-to-audio model
Woosh-DVFlow	Video-to-Audio	4	1.0	Distilled VFlow, fast video-to-audio

Components

Woosh-AE — High-quality latent encoder/decoder. Provides latents for generative modeling and decodes audio from generated latents.
Woosh-CLAP (TextConditionerA/V) — Multimodal text-audio alignment model. Provides token latents for diffusion model conditioning. TextConditionerA for T2A, TextConditionerV for V2A.
Woosh-Flow / Woosh-DFlow — Original and distilled LDMs for text-to-audio generation.
Woosh-VFlow — Multimodal LDM generating audio from video with optional text prompts.

ComfyUI Nodes

Use these models in ComfyUI with ComfyUI-Woosh:

# Via ComfyUI Manager — search "Woosh" and click Install
# Or manually:
cd ComfyUI/custom_nodes
git clone https://github.com/Saganaki22/ComfyUI-Woosh.git
pip install -r ComfyUI-Woosh/requirements.txt

Place downloaded model folders in ComfyUI/models/woosh/. See the ComfyUI-Woosh README for full setup and workflow examples.

Note: Set the Woosh TextConditioning node to T2A for Flow/DFlow models and V2A for VFlow/DVFlow models.

Inference

See the official Woosh repository for standalone inference code and training details.

VRAM Requirements

Model	VRAM (Approx)
Flow / VFlow	~8-12 GB
DFlow / DVFlow	~4-6 GB
With CPU offload	~2-4 GB

Citation

@article{saghibakshi2025woosh,
      title={Woosh: Enhancing Text-to-Audio Generation with Flow Matching and FlowMap Distillation},
      author={Saghibakshi, Ali and Bakshi, Soroosh and Tagliasacchi, Antonio and Wang, Shaojie and Choi, Jongmin and
Kawakami, Kazuhiro and Gu, Yuxuan},
      journal={arXiv preprint arXiv:2502.07359},
      year={2025}
}

License

Code — Apache 2.0
Model Weights — CC BY-NC 4.0

Downloads last month: -; Downloads are not tracked for this model. How to track

Paper for drbaph/Woosh

Towards energy-insensitive and robust neutron/gamma classification: A learning-based frequency-domain parametric approach

Paper • 2502.07359 • Published May 27, 2025