Silero Language Classifier 95 — GGUF

GGUF conversion of Silero's 95-language classifier for use with CrispASR.

Model Details

Architecture: Learned STFT frontend + 8-stage MobileNet-style depthwise-separable conv encoder with interleaved post-norm transformers + attention-weighted pooling + 95-language / 58-group classifiers
Parameters: 507 tensors, ~4M parameters
Input: Raw 16 kHz mono PCM audio (best results on clips < 20 seconds)
Output: 95-language log-probabilities + 58-language-group log-probabilities
License: MIT (same as upstream Silero)

Files

File	Type	Size	Notes
`silero-lid-lang95-f32.gguf`	F32	16 MB	Full precision, recommended

Quantized versions (Q8_0, Q5_0) were tested but break accuracy — the model is dominated by small Conv1d kernels where block quantization is destructive. At 16 MB F32, the model is already very small.

Usage with CrispASR

# Language detection pre-step for backends without native LID
crispasr --backend cohere -m cohere-transcribe-q5_0.gguf \
         -f audio.wav -l auto \
         --lid-backend silero --lid-model silero-lid-lang95-f32.gguf

# Standalone detection (via the C API)
silero_lid_context * ctx = silero_lid_init("silero-lid-lang95-f32.gguf", 4);
float conf;
const char * lang = silero_lid_detect(ctx, samples, n_samples, &conf);
// lang = "en, English"
silero_lid_free(ctx);

Supported Languages (95)

The model supports 95 languages across 58 language groups. Language detection works best on audio clips between 3-20 seconds of speech.

Conversion

Converted from the ONNX export (lang_classifier_95.onnx) using models/convert-silero-lid-to-gguf.py from the CrispASR repo.

Acknowledgements

Silero Team for the original model
CrispASR for the native GGUF runtime

Downloads last month: 60

GGUF

Model size

4.22M params

Architecture

silero_lid

Hardware compatibility

32-bit