christopherthompson81 commited on
Commit
8a532b0
·
verified ·
1 Parent(s): 6d585c2

Cross-reference BF16 sibling bundle

Browse files
Files changed (1) hide show
  1. README.md +1 -0
README.md CHANGED
@@ -31,6 +31,7 @@ ONNX Runtime.
31
  - **Conversion script:** [`scripts/granite_export/`](https://github.com/christopherthompson81/vernacula/tree/main/scripts/granite_export)
32
  - **Vernacula:** [github.com/christopherthompson81/vernacula](https://github.com/christopherthompson81/vernacula)
33
  - **Upstream model:** [`ibm-granite/granite-speech-4.1-2b`](https://huggingface.co/ibm-granite/granite-speech-4.1-2b)
 
34
 
35
  ## Highlights
36
 
 
31
  - **Conversion script:** [`scripts/granite_export/`](https://github.com/christopherthompson81/vernacula/tree/main/scripts/granite_export)
32
  - **Vernacula:** [github.com/christopherthompson81/vernacula](https://github.com/christopherthompson81/vernacula)
33
  - **Upstream model:** [`ibm-granite/granite-speech-4.1-2b`](https://huggingface.co/ibm-granite/granite-speech-4.1-2b)
34
+ - **BF16 sibling bundle (Ampere+ GPUs):** [`christopherthompson81/granite-speech-4-1-2b-onnx-bf16`](https://huggingface.co/christopherthompson81/granite-speech-4-1-2b-onnx-bf16) — same content with the LM decoder in BF16, ~39% smaller and ~27% faster end-to-end on hardware with BF16 tensor cores. CPU and pre-Ampere GPU users should stay on this FP32 bundle.
35
 
36
  ## Highlights
37