RagaLoRA: Indian Music LoRA Adapter for ACE-Step 1.5
A LoRA adapter that tunes ACE-Step 1.5's Diffusion Transformer decoder to generate Indian music across ten genres: Hindustani classical, Carnatic classical, Bollywood ballad, qawwali, ghazal, bhajan, Sufi rock, filmi dance, indie Hindi, and Hinglish pop.
What It Does
The base ACE-Step 1.5 model was trained mostly on Western music and produces generic output for Indian genres. This adapter nudges the model toward Indian musical conventions:
- Classical/devotional genres get warmer and slower: Carnatic centroid drops 16%, bhajan tempo drops 19%
- Dance/rock genres get louder: Filmi dance energy rises 38%, Sufi rock energy rises 19%
- Five genres with zero training data still shift coherently, pointing to transfer across related Indian styles
Specs
| Base model | ACE-Step 1.5 (2.4B params, 24-layer DiT decoder) |
| Adapter rank | 16 |
| Alpha | 32 |
| Dropout | 0.1 |
| Target modules | q_proj, k_proj, v_proj, o_proj (all 24 layers) |
| Trainable params | ~11M (0.46% of total) |
| File size | 44 MB |
| Training data | 250 balanced samples from 2,787 Indian music segments |
| Training | 10 epochs, 600 optimizer steps, AdamW lr=1e-4, BFloat16, Apple Silicon MPS |
| PEFT version | 0.18.1 |
Training Data
Five genre categories, 50 samples each, drawn from openly licensed sources:
- Hindustani Classical (1,063 total segments from Saraga dataset)
- Bollywood (728 segments from Internet Archive + YouTube CC)
- Qawwali / Sufi (455 segments)
- Ghazal (269 segments)
- Bhajan (272 segments)
All audio resampled to 48kHz stereo WAV, sliced to 30-120 second segments.
Usage
With ACE-Step 1.5
from acestep.handler import AceStepHandler
handler = AceStepHandler()
handler.initialize_service(
project_root="path/to/ace-step-checkpoints",
config_path="acestep-v15-turbo",
device="cuda", # or "mps" for Apple Silicon
)
# Load the adapter
handler.add_lora("path/to/RagaLoRA/adapter")
handler.set_use_lora(True)
handler.set_lora_scale(0.8) # 0.8 works well; adjust to taste
# Generate
result = handler.generate_music(
captions="Hindustani classical vocal, raag Yaman, sitar and tabla, teentaal, meditative alap",
lyrics="[Alap]\nSa re ga ma pa dha ni\n[Gat]\nYaman ke sur mein",
audio_duration=60,
inference_steps=8,
guidance_scale=7.0,
vocal_language="hi",
)
Disable for A/B comparison
handler.set_use_lora(False) # base model output
# generate...
handler.set_use_lora(True) # adapter output
# generate...
Evaluation Results (A/B vs Base Model)
| Genre | Centroid Change | Energy Change | Tempo Change |
|---|---|---|---|
| Hindustani Classical | -0.5% | -1% | -32% |
| Bollywood Ballad | -6.3% | +17% | +6% |
| Qawwali | -12% | -12% | 0% |
| Ghazal | +5% | +11% | +30% |
| Bhajan | -7% | -15% | -19% |
| Carnatic Classical* | -16% | -17% | 0% |
| Indie Hindi* | +6% | +9% | 0% |
| Sufi Rock* | -5% | +19% | -35% |
| Filmi Dance* | -7% | +38% | -32% |
| Hinglish Pop* | +5% | +16% | 0% |
*Zero-shot (no dedicated training data for these genres)
Limitations
- No perceptual evaluation done. The metric shifts (centroid, RMS, tempo) are signal-level proxies, not proof that the output sounds authentically Indian to trained musicians.
- compIAM raga detection returned null on all generated classical outputs. Whether the model follows actual raga grammar is unknown.
- Small training set (250 samples). Some memorization risk.
- Tonic detection caveat: Indian music uses a movable-do system. The chroma-based "tonic" is just the loudest pitch class, not a musically meaningful key.
Paper
Citation
@article{chawla_2026,
title={RagaLoRA: LoRA-Tuning a Diffusion Music Model for Indian Genres},
author={Chawla, Varun},
year={2026},
month={Feb},
publisher={Zenodo},
doi={10.5281/zenodo.18811689},
url={https://doi.org/10.5281/zenodo.18811689}
}
Author
Varun Chawla - varunc.7633@gmail.com
Framework versions
- PEFT 0.18.1
- Downloads last month
- 4
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support