You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Audio Speed-Level Classifier (Single-Label)

This model is a fine-tuned version of openai/whisper-small for single-label classification. It categorizes the speaking rate of an audio clip into one of three distinct speed levels.

🏷️ Speed Labels

  • fast speed: Rapid speech, often with elided syllables.
  • measured speed: Standard, professional, or moderate speaking pace.
  • slow speed: Deliberate, calm, or hesitant speech.

πŸ›  Usage: Input & Output

1. Input Specifications

  • Processor: Uses WhisperProcessor for resampling and Mel-spectrogram generation.
  • Sampling Rate: 16,000 Hz (Standard Whisper requirement).
  • Audio Format: Mono raw waveform.

2. Output (Single-Label Logic)

Because this is a Single-Label task, the categories are mutually exclusive.

  • Activation: Softmax. This ensures all output probabilities sum exactly to 1.0 (100%).
  • Decision: The model selects the label with the highest probability score.

πŸš€ Inference Code

import torch
import librosa
import numpy as np
from transformers import WhisperProcessor, WhisperForAudioClassification

model_id = "Kang-Chieh/whisper-small-speed_level"
device = "cuda" if torch.cuda.is_available() else "cpu"

# Load the processor and model
processor = WhisperProcessor.from_pretrained(model_id)
model = WhisperForAudioClassification.from_pretrained(model_id).to(device)

def predict_speed(audio_path):
    # 1. Load audio and ensure 16kHz
    audio, _ = librosa.load(audio_path, sr=16000)
    
    # 2. Preprocess
    inputs = processor(audio, sampling_rate=16000, return_tensors="pt").to(device)

    # 3. Inference
    with torch.no_grad():
        logits = model(**inputs).logits
    
    # 4. Single-label logic (Softmax)
    probs = torch.softmax(logits, dim=-1).squeeze().cpu().numpy()
    
    # 5. Get the highest scoring label
    id2label = model.config.id2label
    predicted_id = np.argmax(probs)
    
    return {
        "label": id2label[predicted_id],
        "confidence": float(probs[predicted_id]),
        "all_scores": {id2label[i]: float(probs[i]) for i in range(len(probs))}
    }

# Run example
result = predict_speed("audio_clip.wav")
print(f"Detected Speed: {result['label']} ({result['confidence']:.2%})")
Downloads last month
-
Safetensors
Model size
88.4M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support