RagaLoRA: Indian Music LoRA Adapter for ACE-Step 1.5

A LoRA adapter that tunes ACE-Step 1.5's Diffusion Transformer decoder to generate Indian music across ten genres: Hindustani classical, Carnatic classical, Bollywood ballad, qawwali, ghazal, bhajan, Sufi rock, filmi dance, indie Hindi, and Hinglish pop.

What It Does

The base ACE-Step 1.5 model was trained mostly on Western music and produces generic output for Indian genres. This adapter nudges the model toward Indian musical conventions:

  • Classical/devotional genres get warmer and slower: Carnatic centroid drops 16%, bhajan tempo drops 19%
  • Dance/rock genres get louder: Filmi dance energy rises 38%, Sufi rock energy rises 19%
  • Five genres with zero training data still shift coherently, pointing to transfer across related Indian styles

Specs

Base model ACE-Step 1.5 (2.4B params, 24-layer DiT decoder)
Adapter rank 16
Alpha 32
Dropout 0.1
Target modules q_proj, k_proj, v_proj, o_proj (all 24 layers)
Trainable params ~11M (0.46% of total)
File size 44 MB
Training data 250 balanced samples from 2,787 Indian music segments
Training 10 epochs, 600 optimizer steps, AdamW lr=1e-4, BFloat16, Apple Silicon MPS
PEFT version 0.18.1

Training Data

Five genre categories, 50 samples each, drawn from openly licensed sources:

  • Hindustani Classical (1,063 total segments from Saraga dataset)
  • Bollywood (728 segments from Internet Archive + YouTube CC)
  • Qawwali / Sufi (455 segments)
  • Ghazal (269 segments)
  • Bhajan (272 segments)

All audio resampled to 48kHz stereo WAV, sliced to 30-120 second segments.

Usage

With ACE-Step 1.5

from acestep.handler import AceStepHandler

handler = AceStepHandler()
handler.initialize_service(
    project_root="path/to/ace-step-checkpoints",
    config_path="acestep-v15-turbo",
    device="cuda",  # or "mps" for Apple Silicon
)

# Load the adapter
handler.add_lora("path/to/RagaLoRA/adapter")
handler.set_use_lora(True)
handler.set_lora_scale(0.8)  # 0.8 works well; adjust to taste

# Generate
result = handler.generate_music(
    captions="Hindustani classical vocal, raag Yaman, sitar and tabla, teentaal, meditative alap",
    lyrics="[Alap]\nSa re ga ma pa dha ni\n[Gat]\nYaman ke sur mein",
    audio_duration=60,
    inference_steps=8,
    guidance_scale=7.0,
    vocal_language="hi",
)

Disable for A/B comparison

handler.set_use_lora(False)   # base model output
# generate...
handler.set_use_lora(True)    # adapter output
# generate...

Evaluation Results (A/B vs Base Model)

Genre Centroid Change Energy Change Tempo Change
Hindustani Classical -0.5% -1% -32%
Bollywood Ballad -6.3% +17% +6%
Qawwali -12% -12% 0%
Ghazal +5% +11% +30%
Bhajan -7% -15% -19%
Carnatic Classical* -16% -17% 0%
Indie Hindi* +6% +9% 0%
Sufi Rock* -5% +19% -35%
Filmi Dance* -7% +38% -32%
Hinglish Pop* +5% +16% 0%

*Zero-shot (no dedicated training data for these genres)

Limitations

  • No perceptual evaluation done. The metric shifts (centroid, RMS, tempo) are signal-level proxies, not proof that the output sounds authentically Indian to trained musicians.
  • compIAM raga detection returned null on all generated classical outputs. Whether the model follows actual raga grammar is unknown.
  • Small training set (250 samples). Some memorization risk.
  • Tonic detection caveat: Indian music uses a movable-do system. The chroma-based "tonic" is just the loudest pitch class, not a musically meaningful key.

Paper

DOI: 10.5281/zenodo.18811689

Citation

@article{chawla_2026,
  title={RagaLoRA: LoRA-Tuning a Diffusion Music Model for Indian Genres},
  author={Chawla, Varun},
  year={2026},
  month={Feb},
  publisher={Zenodo},
  doi={10.5281/zenodo.18811689},
  url={https://doi.org/10.5281/zenodo.18811689}
}

Author

Varun Chawla - varunc.7633@gmail.com

Framework versions

  • PEFT 0.18.1
Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support