You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Configuration Parsing Warning:In adapter_config.json: "peft.task_type" must be a string

Whisper Large V3 - Indian English LoRA Adapter (Psychiatric Domain)

This is a LoRA (Low-Rank Adaptation) adapter for OpenAI's Whisper Large V3 model, fine-tuned on real-world psychiatric interviews and therapy sessions conducted in Indian English.

⚠️ Content Warning & Gated Repository

This model is behind a gated repository for important safety reasons:

Since this model was trained on real-world clinical psychiatric sessions and therapy interviews, there is an anticipated risk that model hallucinations may contain disturbing or sensitive content related to mental health topics. This model is intended solely for research purposes and clinical applications by qualified teams.

To request access: Please contact the repository owner with:

  • Your research affiliation or clinical organization
  • Intended use case
  • Confirmation of ethical approval for your project (if applicable)

Model Description

  • Base Model: openai/whisper-large-v3
  • Adapter Type: LoRA (Low-Rank Adaptation)
  • Language: English (Indian English dialect)
  • Task: Automatic Speech Recognition (Transcription)
  • Domain: Clinical/Psychiatric interviews and therapy sessions
  • PEFT Version: 0.18.0

LoRA Configuration

  • Rank (r): 32
  • Alpha: 64
  • Dropout: 0.05
  • Target Modules: q_proj, k_proj, v_proj, out_proj, fc1, fc2

Performance

The fine-tuned model shows significant improvements over the base Whisper Large V3 model on Indian English clinical data:

Test Set Results (718 files, ~5.5 hours)

Metric Base Model Fine-tuned Model Improvement
WER (normalized) 30.36% 23.89% 21.31% relative
CER 22.37% 14.16% 36.70% relative

Detailed WER Breakdown:

  • Substitutions: 5,690 (reduced from 6,646)
  • Insertions: 3,924 (increased from 2,324)
  • Deletions: 1,415 (reduced from 5,045)
  • Correct predictions: 39,058 (increased from 34,472)

Dev Set Results (635 files, ~4.8 hours)

Metric Base Model Fine-tuned Model Improvement
WER (normalized) 36.10% 28.22% 21.83% relative
CER 28.56% 17.13% 40.04% relative

Detailed WER Breakdown:

  • Substitutions: 5,828 (reduced from 6,768)
  • Insertions: 4,647 (increased from 2,895)
  • Deletions: 1,756 (reduced from 5,984)
  • Correct predictions: 35,762 (increased from 30,594)

Usage

To use this adapter, you'll need to install the required libraries and load both the base model and the adapter:

from transformers import WhisperForConditionalGeneration, WhisperProcessor
from peft import PeftModel
import torch

# Load base model and processor
base_model_name = "openai/whisper-large-v3"
processor = WhisperProcessor.from_pretrained(base_model_name)
model = WhisperForConditionalGeneration.from_pretrained(
    base_model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Load LoRA adapter
model = PeftModel.from_pretrained(
    model,
    "Lekhansh/whisper-large-v3-indian-english-lora"
)

# Transcribe audio
import librosa

audio_path = "path/to/your/audio.wav"
audio, sr = librosa.load(audio_path, sr=16000)

input_features = processor(
    audio,
    sampling_rate=16000,
    return_tensors="pt"
).input_features.to(model.device)

# Generate transcription
predicted_ids = model.generate(
    input_features,
    language="en",
    task="transcribe"
)

transcription = processor.batch_decode(
    predicted_ids,
    skip_special_tokens=True
)[0]

print(transcription)

Merging Adapter with Base Model (Optional)

If you want to merge the adapter weights with the base model for faster inference:

from transformers import WhisperForConditionalGeneration
from peft import PeftModel

# Load base model and adapter
base_model = WhisperForConditionalGeneration.from_pretrained(
    "openai/whisper-large-v3",
    torch_dtype=torch.float16
)
model = PeftModel.from_pretrained(
    base_model,
    "Lekhansh/whisper-large-v3-indian-english-lora"
)

# Merge and unload
merged_model = model.merge_and_unload()

# Save merged model
merged_model.save_pretrained("./merged_model")

Training Details

Training Data

The model was trained on psychiatric interviews and therapy sessions conducted in Indian English. The training data consists of:

  • Real-world clinical conversations
  • Indian English dialect variations
  • Mental health and psychiatric domain terminology

Evaluation Data

  • Test Set: 718 audio files, total duration of 5 hours 32 minutes 46 seconds
  • Dev Set: 635 audio files, total duration of 4 hours 50 minutes 50 seconds

Training Procedure

The model was fine-tuned using LoRA (Low-Rank Adaptation) on the encoder and decoder attention layers and feed-forward networks of Whisper Large V3. This approach allows efficient adaptation while maintaining most of the original model's capabilities.

Intended Use

Primary Use Cases

  • Clinical Research: Transcription of psychiatric interviews and therapy sessions in Indian English
  • Mental Health Applications: Automated documentation of clinical sessions
  • Dialect-Specific ASR: Improved recognition of Indian English speech patterns

Out-of-Scope Use

  • Diagnostic Tool: This model should not be used as a standalone diagnostic tool
  • Replacement for Human Judgment: Transcriptions should be reviewed by qualified professionals
  • Non-Clinical General Purpose ASR: While it may work on general Indian English, it's optimized for clinical domain

Limitations and Biases

  • Domain-Specific: Optimized for clinical/psychiatric domain; may not generalize well to other domains
  • Dialect: Specifically tuned for Indian English; performance on other English dialects may vary
  • Hallucination Risk: As with all LLMs, the model may hallucinate content, which in this clinical context could include disturbing mental health-related content
  • Data Privacy: Trained on real clinical data; users must ensure compliance with data protection regulations (HIPAA, GDPR, etc.)

Ethical Considerations

  • This model was trained on real psychiatric sessions. Users must ensure appropriate ethical approvals and patient consent for any clinical use
  • Transcriptions should always be reviewed by qualified professionals
  • The model should not be used for surveillance or unauthorized recording of clinical sessions
  • Proper data security and patient confidentiality must be maintained

License

This adapter is released under the Apache 2.0 license. However, users must also comply with OpenAI's Whisper license and any applicable regulations regarding clinical data.

Citation

If you use this model in your research, please cite:

@misc{whisper-indian-english-lora,
  title={Whisper Large V3 Indian English LoRA Adapter for Pyschiatric Interviews},
  author={Lekhansh Shukla, Prakrithi Shivaprakash},
  year={2025},
  publisher={HuggingFace},
  howpublished={\url{https://huggingface.co/lekhansh/whisper-large-v3-indian-english-lora}}
}

Contact

For access requests or questions about this model, please contact Dr Lekhansh Shukla @ drlekhansh@gmail.com.


Disclaimer: This model is provided for research purposes only. Users are responsible for ensuring compliance with all applicable laws, regulations, and ethical guidelines when using this model, particularly regarding patient privacy and clinical data handling.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Lekhansh/whisper-large-v3-indian-english-lora

Adapter
(197)
this model

Collection including Lekhansh/whisper-large-v3-indian-english-lora