You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Audio Gender Classifier (Single-Label)

This model is a fine-tuned version of openai/whisper-small for single-label classification. It predicts the perceived speaker gender of an audio clip as one of two mutually exclusive categories.

🏷️ Gender Labels

  • male: Audio is classified as a male voice.
  • female: Audio is classified as a female voice.

πŸ›  Usage: Input & Output

1. Input Specifications

  • Processor: Uses WhisperProcessor for resampling and log-Mel feature extraction.
  • Sampling Rate: 16,000 Hz.
  • Audio Format: Mono raw waveform.
  • Recommended Processor: openai/whisper-small

2. Output (Single-Label Logic)

Because this is a Single-Label task, the categories are mutually exclusive.

  • Activation: Softmax. The output probabilities sum to 1.0.
  • Decision: The model selects the label with the highest probability score.

πŸ“Š Label Mapping

{
    0: "male",
    1: "female",
}

πŸš€ Inference Code

import torch
import librosa
import numpy as np
from transformers import WhisperProcessor, WhisperForAudioClassification

model_id = "Kang-Chieh/whisper-small-gender"
processor_id = "openai/whisper-small"
device = "cuda" if torch.cuda.is_available() else "cpu"

# Load the processor and model
processor = WhisperProcessor.from_pretrained(processor_id)
model = WhisperForAudioClassification.from_pretrained(model_id).to(device)

def predict_gender(audio_path):
    # 1. Load audio and ensure 16kHz mono audio
    audio, _ = librosa.load(audio_path, sr=16000, mono=True)

    # 2. Preprocess
    inputs = processor(audio, sampling_rate=16000, return_tensors="pt")
    input_features = inputs.input_features.to(device)

    # 3. Inference
    with torch.no_grad():
        logits = model(input_features=input_features).logits

    # 4. Single-label logic (Softmax)
    probs = torch.softmax(logits, dim=-1).squeeze().cpu().numpy()

    # 5. Get the highest scoring label
    id2label = {int(k): v for k, v in model.config.id2label.items()}
    predicted_id = int(np.argmax(probs))

    return {
        "label": id2label[predicted_id],
        "confidence": float(probs[predicted_id]),
        "all_scores": {id2label[i]: float(probs[i]) for i in range(len(probs))},
    }

# Run example
result = predict_gender("audio_clip.wav")
print(f"Detected Gender: {result['label']} ({result['confidence']:.2%})")

πŸ“ˆ Reported Performance

From the saved evaluation results:

  • Accuracy: 0.95
  • Macro F1: 0.95

Class-wise summary:

  • male: precision 0.95, recall 0.96, f1-score 0.95
  • female: precision 0.96, recall 0.94, f1-score 0.95
Downloads last month
-
Safetensors
Model size
88.4M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support