Silero Language Classifier 95 โ€” GGUF

GGUF conversion of Silero's 95-language classifier for use with CrispASR.

Model Details

  • Architecture: Learned STFT frontend + 8-stage MobileNet-style depthwise-separable conv encoder with interleaved post-norm transformers + attention-weighted pooling + 95-language / 58-group classifiers
  • Parameters: 507 tensors, ~4M parameters
  • Input: Raw 16 kHz mono PCM audio (best results on clips < 20 seconds)
  • Output: 95-language log-probabilities + 58-language-group log-probabilities
  • License: MIT (same as upstream Silero)

Files

File Type Size Notes
silero-lid-lang95-f32.gguf F32 16 MB Full precision, recommended

Quantized versions (Q8_0, Q5_0) were tested but break accuracy โ€” the model is dominated by small Conv1d kernels where block quantization is destructive. At 16 MB F32, the model is already very small.

Usage with CrispASR

# Language detection pre-step for backends without native LID
crispasr --backend cohere -m cohere-transcribe-q5_0.gguf \
         -f audio.wav -l auto \
         --lid-backend silero --lid-model silero-lid-lang95-f32.gguf

# Standalone detection (via the C API)
silero_lid_context * ctx = silero_lid_init("silero-lid-lang95-f32.gguf", 4);
float conf;
const char * lang = silero_lid_detect(ctx, samples, n_samples, &conf);
// lang = "en, English"
silero_lid_free(ctx);

Supported Languages (95)

The model supports 95 languages across 58 language groups. Language detection works best on audio clips between 3-20 seconds of speech.

Conversion

Converted from the ONNX export (lang_classifier_95.onnx) using models/convert-silero-lid-to-gguf.py from the CrispASR repo.

Acknowledgements

Downloads last month
60
GGUF
Model size
4.22M params
Architecture
silero_lid
Hardware compatibility
Log In to add your hardware

32-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support