RagaLoRA: Indian Music LoRA Adapter for ACE-Step 1.5

A LoRA adapter that tunes ACE-Step 1.5's Diffusion Transformer decoder to generate Indian music across ten genres: Hindustani classical, Carnatic classical, Bollywood ballad, qawwali, ghazal, bhajan, Sufi rock, filmi dance, indie Hindi, and Hinglish pop.

What It Does

The base ACE-Step 1.5 model was trained mostly on Western music and produces generic output for Indian genres. This adapter nudges the model toward Indian musical conventions:

Classical/devotional genres get warmer and slower: Carnatic centroid drops 16%, bhajan tempo drops 19%
Dance/rock genres get louder: Filmi dance energy rises 38%, Sufi rock energy rises 19%
Five genres with zero training data still shift coherently, pointing to transfer across related Indian styles

Specs


Base model	ACE-Step 1.5 (2.4B params, 24-layer DiT decoder)
Adapter rank	16
Alpha	32
Dropout	0.1
Target modules	q_proj, k_proj, v_proj, o_proj (all 24 layers)
Trainable params	~11M (0.46% of total)
File size	44 MB
Training data	250 balanced samples from 2,787 Indian music segments
Training	10 epochs, 600 optimizer steps, AdamW lr=1e-4, BFloat16, Apple Silicon MPS
PEFT version	0.18.1

Training Data

Five genre categories, 50 samples each, drawn from openly licensed sources:

Hindustani Classical (1,063 total segments from Saraga dataset)
Bollywood (728 segments from Internet Archive + YouTube CC)
Qawwali / Sufi (455 segments)
Ghazal (269 segments)
Bhajan (272 segments)

All audio resampled to 48kHz stereo WAV, sliced to 30-120 second segments.

Usage

With ACE-Step 1.5

from acestep.handler import AceStepHandler

handler = AceStepHandler()
handler.initialize_service(
    project_root="path/to/ace-step-checkpoints",
    config_path="acestep-v15-turbo",
    device="cuda",  # or "mps" for Apple Silicon
)

# Load the adapter
handler.add_lora("path/to/RagaLoRA/adapter")
handler.set_use_lora(True)
handler.set_lora_scale(0.8)  # 0.8 works well; adjust to taste

# Generate
result = handler.generate_music(
    captions="Hindustani classical vocal, raag Yaman, sitar and tabla, teentaal, meditative alap",
    lyrics="[Alap]\nSa re ga ma pa dha ni\n[Gat]\nYaman ke sur mein",
    audio_duration=60,
    inference_steps=8,
    guidance_scale=7.0,
    vocal_language="hi",
)

Disable for A/B comparison

handler.set_use_lora(False)   # base model output
# generate...
handler.set_use_lora(True)    # adapter output
# generate...

Evaluation Results (A/B vs Base Model)

Genre	Centroid Change	Energy Change	Tempo Change
Hindustani Classical	-0.5%	-1%	-32%
Bollywood Ballad	-6.3%	+17%	+6%
Qawwali	-12%	-12%	0%
Ghazal	+5%	+11%	+30%
Bhajan	-7%	-15%	-19%
Carnatic Classical*	-16%	-17%	0%
Indie Hindi*	+6%	+9%	0%
Sufi Rock*	-5%	+19%	-35%
Filmi Dance*	-7%	+38%	-32%
Hinglish Pop*	+5%	+16%	0%

*Zero-shot (no dedicated training data for these genres)

Limitations

No perceptual evaluation done. The metric shifts (centroid, RMS, tempo) are signal-level proxies, not proof that the output sounds authentically Indian to trained musicians.
compIAM raga detection returned null on all generated classical outputs. Whether the model follows actual raga grammar is unknown.
Small training set (250 samples). Some memorization risk.
Tonic detection caveat: Indian music uses a movable-do system. The chroma-based "tonic" is just the loudest pitch class, not a musically meaningful key.

Paper

DOI: 10.5281/zenodo.18811689

Citation

@article{chawla_2026,
  title={RagaLoRA: LoRA-Tuning a Diffusion Music Model for Indian Genres},
  author={Chawla, Varun},
  year={2026},
  month={Feb},
  publisher={Zenodo},
  doi={10.5281/zenodo.18811689},
  url={https://doi.org/10.5281/zenodo.18811689}
}

Author

Varun Chawla - varunc.7633@gmail.com

Framework versions

PEFT 0.18.1

Downloads last month: 4

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support