YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Multi-Label Audio Expressiveness Classifier
This model is a fine-tuned version of openai/whisper-large-v3 for multi-label audio classification. It is designed to detect six distinct expressiveness traits in speech audio.
π·οΈ Labels
"enthusiastic", "happy", "angry", "saddened", "awed", "calm", "anxious", "disgusted", "scared", "confused", "bored", "sleepy", "pained", "guilt","sarcastic", "sympathetic", "admiring", "desirous"
π Usage: Input & Output
1. Input Specifications
The model requires audio formatted specifically for the Whisper architecture:
- Sampling Rate: Must be 16,000 Hz. Resample your audio if necessary.
- Duration: Best performance is achieved on clips between 0.5s and 30s.
- Pre-processing: Use the
WhisperProcessorto convert raw waveforms into the Mel-spectrogram format expected by the model.
2. Output Format (Multi-Label)
Unlike standard classifiers, this model uses a Multi-Label approach. This means:
- Independence: Each label is calculated independently. The probabilities do not sum to 1.0.
- Vector Output: The model outputs a vector of 5 probabilities (via Sigmoid activation).
- Thresholding: A label is considered "Active" if its probability is above a certain threshold (default is 0.5).
π Quick Start (Inference)
import torch
import librosa
from transformers import WhisperProcessor, WhisperForAudioClassification
model_id = "Kang-Chieh/whisper-large-v3-mlb-emotion"
device = "cuda" if torch.cuda.is_available() else "cpu"
feature_extractor = WhisperProcessor.from_pretrained(model_id)
model = WhisperForAudioClassification.from_pretrained(model_id).to(device)
def predict(audio_path, threshold=0.5):
# Load and resample audio
audio, _ = librosa.load(audio_path, sr=16000)
inputs = feature_extractor(audio, sampling_rate=16000, return_tensors="pt").to(device)
with torch.no_grad():
logits = model(**inputs).logits
# Multi-label uses Sigmoid activation
probs = torch.sigmoid(logits).squeeze().cpu().numpy()
id2label = model.config.id2label
results = {id2label[i]: float(probs[i]) for i in range(len(probs))}
# Filter for active tags
active_tags = [tag for tag, score in results.items() if score > threshold]
return active_tags, results
tags, scores = predict("path_to_your_audio.wav")
print(f"Detected Tags: {tags}")
- Downloads last month
- -
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support