christopherthompson81
/

granite-speech-4-1-2b-onnx

Automatic Speech Recognition

Model card Files Files and versions

christopherthompson81 commited on 15 days ago

Commit

8a532b0

·

verified ·

1 Parent(s): 6d585c2

Cross-reference BF16 sibling bundle

Files changed (1) hide show

README.md +1 -0

README.md CHANGED Viewed

@@ -31,6 +31,7 @@ ONNX Runtime.
 - **Conversion script:** [`scripts/granite_export/`](https://github.com/christopherthompson81/vernacula/tree/main/scripts/granite_export)
 - **Vernacula:** [github.com/christopherthompson81/vernacula](https://github.com/christopherthompson81/vernacula)
 - **Upstream model:** [`ibm-granite/granite-speech-4.1-2b`](https://huggingface.co/ibm-granite/granite-speech-4.1-2b)
 ## Highlights

 - **Conversion script:** [`scripts/granite_export/`](https://github.com/christopherthompson81/vernacula/tree/main/scripts/granite_export)
 - **Vernacula:** [github.com/christopherthompson81/vernacula](https://github.com/christopherthompson81/vernacula)
 - **Upstream model:** [`ibm-granite/granite-speech-4.1-2b`](https://huggingface.co/ibm-granite/granite-speech-4.1-2b)
+- **BF16 sibling bundle (Ampere+ GPUs):** [`christopherthompson81/granite-speech-4-1-2b-onnx-bf16`](https://huggingface.co/christopherthompson81/granite-speech-4-1-2b-onnx-bf16) — same content with the LM decoder in BF16, ~39% smaller and ~27% faster end-to-end on hardware with BF16 tensor cores. CPU and pre-Ampere GPU users should stay on this FP32 bundle.
 ## Highlights