Audio Gender Classifier (Single-Label)
This model is a fine-tuned version of openai/whisper-small for single-label classification. It predicts the perceived speaker gender of an audio clip as one of two mutually exclusive categories.
π·οΈ Gender Labels
male: Audio is classified as a male voice.female: Audio is classified as a female voice.
π Usage: Input & Output
1. Input Specifications
- Processor: Uses
WhisperProcessorfor resampling and log-Mel feature extraction. - Sampling Rate: 16,000 Hz.
- Audio Format: Mono raw waveform.
- Recommended Processor:
openai/whisper-small
2. Output (Single-Label Logic)
Because this is a Single-Label task, the categories are mutually exclusive.
- Activation: Softmax. The output probabilities sum to 1.0.
- Decision: The model selects the label with the highest probability score.
π Label Mapping
{
0: "male",
1: "female",
}
π Inference Code
import torch
import librosa
import numpy as np
from transformers import WhisperProcessor, WhisperForAudioClassification
model_id = "Kang-Chieh/whisper-small-gender"
processor_id = "openai/whisper-small"
device = "cuda" if torch.cuda.is_available() else "cpu"
# Load the processor and model
processor = WhisperProcessor.from_pretrained(processor_id)
model = WhisperForAudioClassification.from_pretrained(model_id).to(device)
def predict_gender(audio_path):
# 1. Load audio and ensure 16kHz mono audio
audio, _ = librosa.load(audio_path, sr=16000, mono=True)
# 2. Preprocess
inputs = processor(audio, sampling_rate=16000, return_tensors="pt")
input_features = inputs.input_features.to(device)
# 3. Inference
with torch.no_grad():
logits = model(input_features=input_features).logits
# 4. Single-label logic (Softmax)
probs = torch.softmax(logits, dim=-1).squeeze().cpu().numpy()
# 5. Get the highest scoring label
id2label = {int(k): v for k, v in model.config.id2label.items()}
predicted_id = int(np.argmax(probs))
return {
"label": id2label[predicted_id],
"confidence": float(probs[predicted_id]),
"all_scores": {id2label[i]: float(probs[i]) for i in range(len(probs))},
}
# Run example
result = predict_gender("audio_clip.wav")
print(f"Detected Gender: {result['label']} ({result['confidence']:.2%})")
π Reported Performance
From the saved evaluation results:
- Accuracy: 0.95
- Macro F1: 0.95
Class-wise summary:
male: precision 0.95, recall 0.96, f1-score 0.95female: precision 0.96, recall 0.94, f1-score 0.95
- Downloads last month
- -
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support