Silero Language Classifier 95 โ GGUF
GGUF conversion of Silero's 95-language classifier for use with CrispASR.
Model Details
- Architecture: Learned STFT frontend + 8-stage MobileNet-style depthwise-separable conv encoder with interleaved post-norm transformers + attention-weighted pooling + 95-language / 58-group classifiers
- Parameters: 507 tensors, ~4M parameters
- Input: Raw 16 kHz mono PCM audio (best results on clips < 20 seconds)
- Output: 95-language log-probabilities + 58-language-group log-probabilities
- License: MIT (same as upstream Silero)
Files
| File | Type | Size | Notes |
|---|---|---|---|
silero-lid-lang95-f32.gguf |
F32 | 16 MB | Full precision, recommended |
Quantized versions (Q8_0, Q5_0) were tested but break accuracy โ the model is dominated by small Conv1d kernels where block quantization is destructive. At 16 MB F32, the model is already very small.
Usage with CrispASR
# Language detection pre-step for backends without native LID
crispasr --backend cohere -m cohere-transcribe-q5_0.gguf \
-f audio.wav -l auto \
--lid-backend silero --lid-model silero-lid-lang95-f32.gguf
# Standalone detection (via the C API)
silero_lid_context * ctx = silero_lid_init("silero-lid-lang95-f32.gguf", 4);
float conf;
const char * lang = silero_lid_detect(ctx, samples, n_samples, &conf);
// lang = "en, English"
silero_lid_free(ctx);
Supported Languages (95)
The model supports 95 languages across 58 language groups. Language detection works best on audio clips between 3-20 seconds of speech.
Conversion
Converted from the ONNX export (lang_classifier_95.onnx) using models/convert-silero-lid-to-gguf.py from the CrispASR repo.
Acknowledgements
- Silero Team for the original model
- CrispASR for the native GGUF runtime
- Downloads last month
- 60
Hardware compatibility
Log In to add your hardware
32-bit