You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Audio Gender Classifier (Single-Label)

This model is a fine-tuned version of openai/whisper-small for single-label classification. It predicts the perceived speaker gender of an audio clip as one of two mutually exclusive categories.

🏷️ Gender Labels

male: Audio is classified as a male voice.
female: Audio is classified as a female voice.

🛠 Usage: Input & Output

1. Input Specifications

Processor: Uses WhisperProcessor for resampling and log-Mel feature extraction.
Sampling Rate: 16,000 Hz.
Audio Format: Mono raw waveform.
Recommended Processor: openai/whisper-small

2. Output (Single-Label Logic)

Because this is a Single-Label task, the categories are mutually exclusive.

Activation: Softmax. The output probabilities sum to 1.0.
Decision: The model selects the label with the highest probability score.

📊 Label Mapping

{
    0: "male",
    1: "female",
}

🚀 Inference Code

import torch
import librosa
import numpy as np
from transformers import WhisperProcessor, WhisperForAudioClassification

model_id = "Kang-Chieh/whisper-small-gender"
processor_id = "openai/whisper-small"
device = "cuda" if torch.cuda.is_available() else "cpu"

# Load the processor and model
processor = WhisperProcessor.from_pretrained(processor_id)
model = WhisperForAudioClassification.from_pretrained(model_id).to(device)

def predict_gender(audio_path):
    # 1. Load audio and ensure 16kHz mono audio
    audio, _ = librosa.load(audio_path, sr=16000, mono=True)

    # 2. Preprocess
    inputs = processor(audio, sampling_rate=16000, return_tensors="pt")
    input_features = inputs.input_features.to(device)

    # 3. Inference
    with torch.no_grad():
        logits = model(input_features=input_features).logits

    # 4. Single-label logic (Softmax)
    probs = torch.softmax(logits, dim=-1).squeeze().cpu().numpy()

    # 5. Get the highest scoring label
    id2label = {int(k): v for k, v in model.config.id2label.items()}
    predicted_id = int(np.argmax(probs))

    return {
        "label": id2label[predicted_id],
        "confidence": float(probs[predicted_id]),
        "all_scores": {id2label[i]: float(probs[i]) for i in range(len(probs))},
    }

# Run example
result = predict_gender("audio_clip.wav")
print(f"Detected Gender: {result['label']} ({result['confidence']:.2%})")

📈 Reported Performance

From the saved evaluation results:

Accuracy: 0.95
Macro F1: 0.95

Class-wise summary:

male: precision 0.95, recall 0.96, f1-score 0.95
female: precision 0.96, recall 0.94, f1-score 0.95

Downloads last month: -

Safetensors

Model size

88.4M params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support