You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Audio Speed-Level Classifier (Single-Label)

This model is a fine-tuned version of openai/whisper-small for single-label classification. It categorizes the speaking rate of an audio clip into one of three distinct speed levels.

🏷️ Speed Labels

fast speed: Rapid speech, often with elided syllables.
measured speed: Standard, professional, or moderate speaking pace.
slow speed: Deliberate, calm, or hesitant speech.

🛠 Usage: Input & Output

1. Input Specifications

Processor: Uses WhisperProcessor for resampling and Mel-spectrogram generation.
Sampling Rate: 16,000 Hz (Standard Whisper requirement).
Audio Format: Mono raw waveform.

2. Output (Single-Label Logic)

Because this is a Single-Label task, the categories are mutually exclusive.

Activation: Softmax. This ensures all output probabilities sum exactly to 1.0 (100%).
Decision: The model selects the label with the highest probability score.

🚀 Inference Code

import torch
import librosa
import numpy as np
from transformers import WhisperProcessor, WhisperForAudioClassification

model_id = "Kang-Chieh/whisper-small-speed_level"
device = "cuda" if torch.cuda.is_available() else "cpu"

# Load the processor and model
processor = WhisperProcessor.from_pretrained(model_id)
model = WhisperForAudioClassification.from_pretrained(model_id).to(device)

def predict_speed(audio_path):
    # 1. Load audio and ensure 16kHz
    audio, _ = librosa.load(audio_path, sr=16000)
    
    # 2. Preprocess
    inputs = processor(audio, sampling_rate=16000, return_tensors="pt").to(device)

    # 3. Inference
    with torch.no_grad():
        logits = model(**inputs).logits
    
    # 4. Single-label logic (Softmax)
    probs = torch.softmax(logits, dim=-1).squeeze().cpu().numpy()
    
    # 5. Get the highest scoring label
    id2label = model.config.id2label
    predicted_id = np.argmax(probs)
    
    return {
        "label": id2label[predicted_id],
        "confidence": float(probs[predicted_id]),
        "all_scores": {id2label[i]: float(probs[i]) for i in range(len(probs))}
    }

# Run example
result = predict_speed("audio_clip.wav")
print(f"Detected Speed: {result['label']} ({result['confidence']:.2%})")

Downloads last month: -

Safetensors

Model size

88.4M params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support