Audio Speed-Level Classifier (Single-Label)
This model is a fine-tuned version of openai/whisper-small for single-label classification. It categorizes the speaking rate of an audio clip into one of three distinct speed levels.
π·οΈ Speed Labels
fast speed: Rapid speech, often with elided syllables.measured speed: Standard, professional, or moderate speaking pace.slow speed: Deliberate, calm, or hesitant speech.
π Usage: Input & Output
1. Input Specifications
- Processor: Uses
WhisperProcessorfor resampling and Mel-spectrogram generation. - Sampling Rate: 16,000 Hz (Standard Whisper requirement).
- Audio Format: Mono raw waveform.
2. Output (Single-Label Logic)
Because this is a Single-Label task, the categories are mutually exclusive.
- Activation: Softmax. This ensures all output probabilities sum exactly to 1.0 (100%).
- Decision: The model selects the label with the highest probability score.
π Inference Code
import torch
import librosa
import numpy as np
from transformers import WhisperProcessor, WhisperForAudioClassification
model_id = "Kang-Chieh/whisper-small-speed_level"
device = "cuda" if torch.cuda.is_available() else "cpu"
# Load the processor and model
processor = WhisperProcessor.from_pretrained(model_id)
model = WhisperForAudioClassification.from_pretrained(model_id).to(device)
def predict_speed(audio_path):
# 1. Load audio and ensure 16kHz
audio, _ = librosa.load(audio_path, sr=16000)
# 2. Preprocess
inputs = processor(audio, sampling_rate=16000, return_tensors="pt").to(device)
# 3. Inference
with torch.no_grad():
logits = model(**inputs).logits
# 4. Single-label logic (Softmax)
probs = torch.softmax(logits, dim=-1).squeeze().cpu().numpy()
# 5. Get the highest scoring label
id2label = model.config.id2label
predicted_id = np.argmax(probs)
return {
"label": id2label[predicted_id],
"confidence": float(probs[predicted_id]),
"all_scores": {id2label[i]: float(probs[i]) for i in range(len(probs))}
}
# Run example
result = predict_speed("audio_clip.wav")
print(f"Detected Speed: {result['label']} ({result['confidence']:.2%})")
- Downloads last month
- -
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support