Towards energy-insensitive and robust neutron/gamma classification: A learning-based frequency-domain parametric approach
Paper β’ 2502.07359 β’ Published
Inference code and open weights for sound effect generative models developed at Sony AI.
| Model | Task | Steps | CFG | Description |
|---|---|---|---|---|
| Woosh-Flow | Text-to-Audio | 50 | 4.5 | Base model, best quality |
| Woosh-DFlow | Text-to-Audio | 4 | 1.0 | Distilled Flow, fast generation |
| Woosh-VFlow | Video-to-Audio | 50 | 4.5 | Base video-to-audio model |
| Woosh-DVFlow | Video-to-Audio | 4 | 1.0 | Distilled VFlow, fast video-to-audio |
Use these models in ComfyUI with ComfyUI-Woosh:
# Via ComfyUI Manager β search "Woosh" and click Install
# Or manually:
cd ComfyUI/custom_nodes
git clone https://github.com/Saganaki22/ComfyUI-Woosh.git
pip install -r ComfyUI-Woosh/requirements.txt
Place downloaded model folders in ComfyUI/models/woosh/. See the ComfyUI-Woosh
README for full setup and workflow examples.
Note: Set the Woosh TextConditioning node to T2A for Flow/DFlow models and V2A for VFlow/DVFlow models.
See the official Woosh repository for standalone inference code and training details.
| Model | VRAM (Approx) |
|---|---|
| Flow / VFlow | ~8-12 GB |
| DFlow / DVFlow | ~4-6 GB |
| With CPU offload | ~2-4 GB |
@article{saghibakshi2025woosh,
title={Woosh: Enhancing Text-to-Audio Generation with Flow Matching and FlowMap Distillation},
author={Saghibakshi, Ali and Bakshi, Soroosh and Tagliasacchi, Antonio and Wang, Shaojie and Choi, Jongmin and
Kawakami, Kazuhiro and Gu, Yuxuan},
journal={arXiv preprint arXiv:2502.07359},
year={2025}
}