ResembleAI
/

Dramabox

@@ -18,14 +18,21 @@ base_model_relation: finetune
 ---
 <p align="center">
-  <img src="https://huggingface.co/ResembleAI/Dramabox/resolve/main/assets/Dramabox.png" alt="DramaBox" width="720"/>
 </p>
 # Dramabox — Expressive TTS with Voice Cloning
 > **Built on [LTX-2](https://github.com/Lightricks/LTX-2) by Lightricks.**
 > Dramabox is **Resemble AI's** expressive TTS, trained on top of the LTX-2.3 audio branch under the LTX-2 Community License. Huge thanks to the Lightricks team for open-sourcing the base.
 Dramabox is a prompt-driven TTS where **the prompt itself controls everything** — speaker identity, emotion, delivery, laughs, sighs, breaths, pauses, transitions. An optional 10-second voice reference clones the target timbre. It is an IC-LoRA fine-tune of the **LTX-2.3 3.3B audio-only** model (Diffusion Transformer + flow matching), conditioned on Gemma 3 12B text embeddings.
 | | |

 ---
 <p align="center">
+  <a href="https://www.resemble.ai/learn/models/dramabox">
+    <img src="https://huggingface.co/ResembleAI/Dramabox/resolve/main/assets/Dramabox.png" alt="DramaBox" width="720"/>
+  </a>
 </p>
 # Dramabox — Expressive TTS with Voice Cloning
+[![Alt Text](https://huggingface.co/datasets/huggingface/badges/resolve/main/open-in-hf-spaces-sm.svg)](https://huggingface.co/spaces/ResembleAI/Dramabox)
+[![Discord](https://img.shields.io/discord/1377773249798344776?label=join%20discord&logo=discord&style=flat)](https://discord.gg/rJq9cRJBJ6)
 > **Built on [LTX-2](https://github.com/Lightricks/LTX-2) by Lightricks.**
 > Dramabox is **Resemble AI's** expressive TTS, trained on top of the LTX-2.3 audio branch under the LTX-2 Community License. Huge thanks to the Lightricks team for open-sourcing the base.
+*Made with ♥️ by* <a href="https://www.resemble.ai/learn/models/dramabox" target="_blank"><img width="100" alt="resemble-logo-horizontal" src="https://github.com/user-attachments/assets/35cf756b-3506-4943-9c72-c05ddfa4e525" /></a>
 Dramabox is a prompt-driven TTS where **the prompt itself controls everything** — speaker identity, emotion, delivery, laughs, sighs, breaths, pauses, transitions. An optional 10-second voice reference clones the target timbre. It is an IC-LoRA fine-tune of the **LTX-2.3 3.3B audio-only** model (Diffusion Transformer + flow matching), conditioned on Gemma 3 12B text embeddings.
 | | |