Wav2Vec2 Large XLSR-53 English โ€” GGUF

GGUF conversion of jonatasgrosman/wav2vec2-large-xlsr-53-english for English speech recognition.

Model details

Property Value
Architecture Wav2Vec2 (CNN feature extractor + 24 Transformer layers)
Hidden size 1024
Attention heads 16
CTC vocabulary 33 tokens
Format GGUF (F16 weights)
Size 627 MB

The model was pre-trained on 53 languages (XLSR-53) and fine-tuned on English Common Voice data with a CTC head. It accepts 16 kHz mono audio and outputs character-level transcriptions.

Usage with CrispASR

crispasr \
  --backend wav2vec2 \
  -m wav2vec2-xlsr-en.gguf \
  audio.wav

Provenance

Weights were converted from the original Hugging Face PyTorch checkpoint into GGUF format with F16 precision for all transformer and feature-extractor parameters.

License

Apache-2.0 โ€” same as the original model.

Downloads last month
212
GGUF
Model size
0.3B params
Architecture
wav2vec2
Hardware compatibility
Log In to add your hardware

5-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for cstr/wav2vec2-large-xlsr-53-english-GGUF

Quantized
(2)
this model