| .. _export_lstm_transducer_models_to_ncnn: |
|
|
| Export LSTM transducer models to ncnn |
| ------------------------------------- |
|
|
| We use the pre-trained model from the following repository as an example: |
|
|
| `<https://huggingface.co/csukuangfj/icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03>`_ |
|
|
| We will show you step by step how to export it to `ncnn`_ and run it with `sherpa-ncnn`_. |
|
|
| .. hint:: |
|
|
| We use ``Ubuntu 18.04``, ``torch 1.13``, and ``Python 3.8`` for testing. |
|
|
| .. caution:: |
|
|
| ``torch > 2.0`` may not work. If you get errors while building pnnx, please switch |
| to ``torch < 2.0``. |
|
|
| 1. Download the pre-trained model |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
|
|
| .. hint:: |
|
|
| You have to install `git-lfs`_ before you continue. |
|
|
|
|
| .. code-block:: bash |
|
|
| cd egs/librispeech/ASR |
| GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/csukuangfj/icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03 |
| cd icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03 |
|
|
| git lfs pull --include "exp/pretrained-iter-468000-avg-16.pt" |
| git lfs pull --include "data/lang_bpe_500/bpe.model" |
|
|
| cd .. |
|
|
| .. note:: |
|
|
| We downloaded ``exp/pretrained-xxx.pt``, not ``exp/cpu-jit_xxx.pt``. |
|
|
| In the above code, we downloaded the pre-trained model into the directory |
| ``egs/librispeech/ASR/icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03``. |
|
|
| 2. Install ncnn and pnnx |
| ^^^^^^^^^^^^^^^^^^^^^^^^ |
|
|
| Please refer to :ref:`export_for_ncnn_install_ncnn_and_pnnx` . |
|
|
|
|
| 3. Export the model via torch.jit.trace() |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
|
|
| First, let us rename our pre-trained model: |
|
|
| .. code-block:: |
|
|
| cd egs/librispeech/ASR |
|
|
| cd icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp |
|
|
| ln -s pretrained-iter-468000-avg-16.pt epoch-99.pt |
|
|
| cd ../.. |
|
|
| Next, we use the following code to export our model: |
|
|
| .. code-block:: bash |
|
|
| dir=./icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03 |
|
|
| ./lstm_transducer_stateless2/export-for-ncnn.py \ |
| --exp-dir $dir/exp \ |
| --tokens $dir/data/lang_bpe_500/tokens.txt \ |
| --epoch 99 \ |
| --avg 1 \ |
| --use-averaged-model 0 \ |
| --num-encoder-layers 12 \ |
| --encoder-dim 512 \ |
| --rnn-hidden-size 1024 |
|
|
| .. hint:: |
|
|
| We have renamed our model to ``epoch-99.pt`` so that we can use ``--epoch 99``. |
| There is only one pre-trained model, so we use ``--avg 1 --use-averaged-model 0``. |
|
|
| If you have trained a model by yourself and if you have all checkpoints |
| available, please first use ``decode.py`` to tune ``--epoch --avg`` |
| and select the best combination with with ``--use-averaged-model 1``. |
|
|
| .. note:: |
|
|
| You will see the following log output: |
|
|
| .. literalinclude:: ./code/export-lstm-transducer-for-ncnn-output.txt |
|
|
| The log shows the model has ``84176356`` parameters, i.e., ``~84 M``. |
|
|
| .. code-block:: |
| |
| ls -lh icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/pretrained-iter-468000-avg-16.pt |
|
|
| -rw-r--r-- 1 kuangfangjun root 324M Feb 17 10:34 icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/pretrained-iter-468000-avg-16.pt |
|
|
| You can see that the file size of the pre-trained model is ``324 MB``, which |
| is roughly equal to ``84176356*4/1024/1024 = 321.107 MB``. |
|
|
| After running ``lstm_transducer_stateless2/export-for-ncnn.py``, |
| we will get the following files: |
|
|
| .. code-block:: bash |
|
|
| ls -lh icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/*pnnx.pt |
|
|
| -rw-r--r-- 1 kuangfangjun root 1010K Feb 17 11:22 icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/decoder_jit_trace-pnnx.pt |
| -rw-r--r-- 1 kuangfangjun root 318M Feb 17 11:22 icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/encoder_jit_trace-pnnx.pt |
| -rw-r--r-- 1 kuangfangjun root 3.0M Feb 17 11:22 icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/joiner_jit_trace-pnnx.pt |
|
|
|
|
| .. _lstm-transducer-step-4-export-torchscript-model-via-pnnx: |
|
|
| 4. Export torchscript model via pnnx |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
|
|
| .. hint:: |
|
|
| Make sure you have set up the ``PATH`` environment variable |
| in :ref:`export_for_ncnn_install_ncnn_and_pnnx`. Otherwise, |
| it will throw an error saying that ``pnnx`` could not be found. |
|
|
| Now, it's time to export our models to `ncnn`_ via ``pnnx``. |
|
|
| .. code-block:: |
|
|
| cd icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/ |
|
|
| pnnx ./encoder_jit_trace-pnnx.pt |
| pnnx ./decoder_jit_trace-pnnx.pt |
| pnnx ./joiner_jit_trace-pnnx.pt |
|
|
| It will generate the following files: |
|
|
| .. code-block:: bash |
|
|
| ls -lh icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/*ncnn*{bin,param} |
|
|
| -rw-r--r-- 1 kuangfangjun root 503K Feb 17 11:32 icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/decoder_jit_trace-pnnx.ncnn.bin |
| -rw-r--r-- 1 kuangfangjun root 437 Feb 17 11:32 icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/decoder_jit_trace-pnnx.ncnn.param |
| -rw-r--r-- 1 kuangfangjun root 159M Feb 17 11:32 icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/encoder_jit_trace-pnnx.ncnn.bin |
| -rw-r--r-- 1 kuangfangjun root 21K Feb 17 11:32 icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/encoder_jit_trace-pnnx.ncnn.param |
| -rw-r--r-- 1 kuangfangjun root 1.5M Feb 17 11:33 icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/joiner_jit_trace-pnnx.ncnn.bin |
| -rw-r--r-- 1 kuangfangjun root 488 Feb 17 11:33 icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/joiner_jit_trace-pnnx.ncnn.param |
|
|
|
|
| There are two types of files: |
|
|
| - ``param``: It is a text file containing the model architectures. You can |
| use a text editor to view its content. |
| - ``bin``: It is a binary file containing the model parameters. |
|
|
| We compare the file sizes of the models below before and after converting via ``pnnx``: |
|
|
| .. see https://tableconvert.com/restructuredtext-generator |
|
|
| +----------------------------------+------------+ |
| | File name | File size | |
| +==================================+============+ |
| | encoder_jit_trace-pnnx.pt | 318 MB | |
| +----------------------------------+------------+ |
| | decoder_jit_trace-pnnx.pt | 1010 KB | |
| +----------------------------------+------------+ |
| | joiner_jit_trace-pnnx.pt | 3.0 MB | |
| +----------------------------------+------------+ |
| | encoder_jit_trace-pnnx.ncnn.bin | 159 MB | |
| +----------------------------------+------------+ |
| | decoder_jit_trace-pnnx.ncnn.bin | 503 KB | |
| +----------------------------------+------------+ |
| | joiner_jit_trace-pnnx.ncnn.bin | 1.5 MB | |
| +----------------------------------+------------+ |
|
|
| You can see that the file sizes of the models after conversion are about one half |
| of the models before conversion: |
|
|
| - encoder: 318 MB vs 159 MB |
| - decoder: 1010 KB vs 503 KB |
| - joiner: 3.0 MB vs 1.5 MB |
|
|
| The reason is that by default ``pnnx`` converts ``float32`` parameters |
| to ``float16``. A ``float32`` parameter occupies 4 bytes, while it is 2 bytes |
| for ``float16``. Thus, it is ``twice smaller`` after conversion. |
|
|
| .. hint:: |
|
|
| If you use ``pnnx ./encoder_jit_trace-pnnx.pt fp16=0``, then ``pnnx`` |
| won't convert ``float32`` to ``float16``. |
|
|
| 5. Test the exported models in icefall |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
|
|
| .. note:: |
|
|
| We assume you have set up the environment variable ``PYTHONPATH`` when |
| building `ncnn`_. |
|
|
| Now we have successfully converted our pre-trained model to `ncnn`_ format. |
| The generated 6 files are what we need. You can use the following code to |
| test the converted models: |
|
|
| .. code-block:: bash |
|
|
| python3 ./lstm_transducer_stateless2/streaming-ncnn-decode.py \ |
| --tokens ./icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/data/lang_bpe_500/tokens.txt \ |
| --encoder-param-filename ./icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/encoder_jit_trace-pnnx.ncnn.param \ |
| --encoder-bin-filename ./icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/encoder_jit_trace-pnnx.ncnn.bin \ |
| --decoder-param-filename ./icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/decoder_jit_trace-pnnx.ncnn.param \ |
| --decoder-bin-filename ./icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/decoder_jit_trace-pnnx.ncnn.bin \ |
| --joiner-param-filename ./icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/joiner_jit_trace-pnnx.ncnn.param \ |
| --joiner-bin-filename ./icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/joiner_jit_trace-pnnx.ncnn.bin \ |
| ./icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/test_wavs/1089-134686-0001.wav |
|
|
| .. hint:: |
|
|
| `ncnn`_ supports only ``batch size == 1``, so ``streaming-ncnn-decode.py`` accepts |
| only 1 wave file as input. |
|
|
| The output is given below: |
|
|
| .. literalinclude:: ./code/test-streaming-ncnn-decode-lstm-transducer-libri.txt |
|
|
| Congratulations! You have successfully exported a model from PyTorch to `ncnn`_! |
|
|
| .. _lstm-modify-the-exported-encoder-for-sherpa-ncnn: |
|
|
| 6. Modify the exported encoder for sherpa-ncnn |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
|
|
| In order to use the exported models in `sherpa-ncnn`_, we have to modify |
| ``encoder_jit_trace-pnnx.ncnn.param``. |
| |
| Let us have a look at the first few lines of ``encoder_jit_trace-pnnx.ncnn.param``: |
|
|
| .. code-block:: |
|
|
| 7767517 |
| 267 379 |
| Input in0 0 1 in0 |
|
|
| **Explanation** of the above three lines: |
|
|
| 1. ``7767517``, it is a magic number and should not be changed. |
| 2. ``267 379``, the first number ``267`` specifies the number of layers |
| in this file, while ``379`` specifies the number of intermediate outputs |
| of this file |
| 3. ``Input in0 0 1 in0``, ``Input`` is the layer type of this layer; ``in0`` |
| is the layer name of this layer; ``0`` means this layer has no input; |
| ``1`` means this layer has one output; ``in0`` is the output name of |
| this layer. |
|
|
| We need to add 1 extra line and also increment the number of layers. |
| The result looks like below: |
|
|
| .. code-block:: bash |
|
|
| 7767517 |
| 268 379 |
| SherpaMetaData sherpa_meta_data1 0 0 0=3 1=12 2=512 3=1024 |
| Input in0 0 1 in0 |
|
|
| **Explanation** |
|
|
| 1. ``7767517``, it is still the same |
| 2. ``268 379``, we have added an extra layer, so we need to update ``267`` to ``268``. |
| We don't need to change ``379`` since the newly added layer has no inputs or outputs. |
| 3. ``SherpaMetaData sherpa_meta_data1 0 0 0=3 1=12 2=512 3=1024`` |
| This line is newly added. Its explanation is given below: |
|
|
| - ``SherpaMetaData`` is the type of this layer. Must be ``SherpaMetaData``. |
| - ``sherpa_meta_data1`` is the name of this layer. Must be ``sherpa_meta_data1``. |
| - ``0 0`` means this layer has no inputs or output. Must be ``0 0`` |
| - ``0=3``, 0 is the key and 3 is the value. MUST be ``0=3`` |
| - ``1=12``, 1 is the key and 12 is the value of the |
| parameter ``--num-encoder-layers`` that you provided when running |
| ``./lstm_transducer_stateless2/export-for-ncnn.py``. |
| - ``2=512``, 2 is the key and 512 is the value of the |
| parameter ``--encoder-dim`` that you provided when running |
| ``./lstm_transducer_stateless2/export-for-ncnn.py``. |
| - ``3=1024``, 3 is the key and 1024 is the value of the |
| parameter ``--rnn-hidden-size`` that you provided when running |
| ``./lstm_transducer_stateless2/export-for-ncnn.py``. |
|
|
| For ease of reference, we list the key-value pairs that you need to add |
| in the following table. If your model has a different setting, please |
| change the values for ``SherpaMetaData`` accordingly. Otherwise, you |
| will be ``SAD``. |
|
|
| +------+-----------------------------+ |
| | key | value | |
| +======+=============================+ |
| | 0 | 3 (fixed) | |
| +------+-----------------------------+ |
| | 1 | ``--num-encoder-layers`` | |
| +------+-----------------------------+ |
| | 2 | ``--encoder-dim`` | |
| +------+-----------------------------+ |
| | 3 | ``--rnn-hidden-size`` | |
| +------+-----------------------------+ |
|
|
| 4. ``Input in0 0 1 in0``. No need to change it. |
|
|
| .. caution:: |
|
|
| When you add a new layer ``SherpaMetaData``, please remember to update the |
| number of layers. In our case, update ``267`` to ``268``. Otherwise, |
| you will be SAD later. |
|
|
| .. hint:: |
|
|
| After adding the new layer ``SherpaMetaData``, you cannot use this model |
| with ``streaming-ncnn-decode.py`` anymore since ``SherpaMetaData`` is |
| supported only in `sherpa-ncnn`_. |
|
|
| .. hint:: |
|
|
| `ncnn`_ is very flexible. You can add new layers to it just by text-editing |
| the ``param`` file! You don't need to change the ``bin`` file. |
|
|
| Now you can use this model in `sherpa-ncnn`_. |
| Please refer to the following documentation: |
|
|
| - Linux/macOS/Windows/arm/aarch64: `<https://k2-fsa.github.io/sherpa/ncnn/install/index.html>`_ |
| - ``Android``: `<https://k2-fsa.github.io/sherpa/ncnn/android/index.html>`_ |
| - ``iOS``: `<https://k2-fsa.github.io/sherpa/ncnn/ios/index.html>`_ |
| - Python: `<https://k2-fsa.github.io/sherpa/ncnn/python/index.html>`_ |
|
|
| We have a list of pre-trained models that have been exported for `sherpa-ncnn`_: |
|
|
| - `<https://k2-fsa.github.io/sherpa/ncnn/pretrained_models/index.html>`_ |
|
|
| You can find more usages there. |
|
|
| 7. (Optional) int8 quantization with sherpa-ncnn |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
|
|
| This step is optional. |
|
|
| In this step, we describe how to quantize our model with ``int8``. |
|
|
| Change :ref:`lstm-transducer-step-4-export-torchscript-model-via-pnnx` to |
| disable ``fp16`` when using ``pnnx``: |
|
|
| .. code-block:: |
|
|
| cd icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/ |
|
|
| pnnx ./encoder_jit_trace-pnnx.pt fp16=0 |
| pnnx ./decoder_jit_trace-pnnx.pt |
| pnnx ./joiner_jit_trace-pnnx.pt fp16=0 |
|
|
| .. note:: |
|
|
| We add ``fp16=0`` when exporting the encoder and joiner. `ncnn`_ does not |
| support quantizing the decoder model yet. We will update this documentation |
| once `ncnn`_ supports it. (Maybe in this year, 2023). |
|
|
| .. code-block:: bash |
|
|
| ls -lh icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/*_jit_trace-pnnx.ncnn.{param,bin} |
|
|
| -rw-r--r-- 1 kuangfangjun root 503K Feb 17 11:32 icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/decoder_jit_trace-pnnx.ncnn.bin |
| -rw-r--r-- 1 kuangfangjun root 437 Feb 17 11:32 icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/decoder_jit_trace-pnnx.ncnn.param |
| -rw-r--r-- 1 kuangfangjun root 317M Feb 17 11:54 icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/encoder_jit_trace-pnnx.ncnn.bin |
| -rw-r--r-- 1 kuangfangjun root 21K Feb 17 11:54 icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/encoder_jit_trace-pnnx.ncnn.param |
| -rw-r--r-- 1 kuangfangjun root 3.0M Feb 17 11:54 icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/joiner_jit_trace-pnnx.ncnn.bin |
| -rw-r--r-- 1 kuangfangjun root 488 Feb 17 11:54 icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/joiner_jit_trace-pnnx.ncnn.param |
|
|
|
|
| Let us compare again the file sizes: |
|
|
| +----------------------------------------+------------+ |
| | File name | File size | |
| +----------------------------------------+------------+ |
| | encoder_jit_trace-pnnx.pt | 318 MB | |
| +----------------------------------------+------------+ |
| | decoder_jit_trace-pnnx.pt | 1010 KB | |
| +----------------------------------------+------------+ |
| | joiner_jit_trace-pnnx.pt | 3.0 MB | |
| +----------------------------------------+------------+ |
| | encoder_jit_trace-pnnx.ncnn.bin (fp16) | 159 MB | |
| +----------------------------------------+------------+ |
| | decoder_jit_trace-pnnx.ncnn.bin (fp16) | 503 KB | |
| +----------------------------------------+------------+ |
| | joiner_jit_trace-pnnx.ncnn.bin (fp16) | 1.5 MB | |
| +----------------------------------------+------------+ |
| | encoder_jit_trace-pnnx.ncnn.bin (fp32) | 317 MB | |
| +----------------------------------------+------------+ |
| | joiner_jit_trace-pnnx.ncnn.bin (fp32) | 3.0 MB | |
| +----------------------------------------+------------+ |
|
|
| You can see that the file sizes are doubled when we disable ``fp16``. |
|
|
| .. note:: |
|
|
| You can again use ``streaming-ncnn-decode.py`` to test the exported models. |
|
|
| Next, follow :ref:`lstm-modify-the-exported-encoder-for-sherpa-ncnn` |
| to modify ``encoder_jit_trace-pnnx.ncnn.param``. |
|
|
| Change |
|
|
| .. code-block:: bash |
|
|
| 7767517 |
| 267 379 |
| Input in0 0 1 in0 |
|
|
| to |
|
|
| .. code-block:: bash |
|
|
| 7767517 |
| 268 379 |
| SherpaMetaData sherpa_meta_data1 0 0 0=3 1=12 2=512 3=1024 |
| Input in0 0 1 in0 |
|
|
| .. caution:: |
|
|
| Please follow :ref:`lstm-modify-the-exported-encoder-for-sherpa-ncnn` |
| to change the values for ``SherpaMetaData`` if your model uses a different setting. |
|
|
| Next, let us compile `sherpa-ncnn`_ since we will quantize our models within |
| `sherpa-ncnn`_. |
|
|
| .. code-block:: bash |
|
|
| # We will download sherpa-ncnn to $HOME/open-source/ |
| # You can change it to anywhere you like. |
| cd $HOME |
| mkdir -p open-source |
|
|
| cd open-source |
| git clone https://github.com/k2-fsa/sherpa-ncnn |
| cd sherpa-ncnn |
| mkdir build |
| cd build |
| cmake .. |
| make -j 4 |
|
|
| ./bin/generate-int8-scale-table |
|
|
| export PATH=$HOME/open-source/sherpa-ncnn/build/bin:$PATH |
|
|
| The output of the above commands are: |
|
|
| .. code-block:: bash |
|
|
| (py38) kuangfangjun:build$ generate-int8-scale-table |
| Please provide 10 arg. Currently given: 1 |
| Usage: |
| generate-int8-scale-table encoder.param encoder.bin decoder.param decoder.bin joiner.param joiner.bin encoder-scale-table.txt joiner-scale-table.txt wave_filenames.txt |
|
|
| Each line in wave_filenames.txt is a path to some 16k Hz mono wave file. |
|
|
| We need to create a file ``wave_filenames.txt``, in which we need to put |
| some calibration wave files. For testing purpose, we put the ``test_wavs`` |
| from the pre-trained model repository |
| `<https://huggingface.co/csukuangfj/icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03>`_ |
|
|
| .. code-block:: bash |
|
|
| cd egs/librispeech/ASR |
| cd icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/ |
|
|
| cat <<EOF > wave_filenames.txt |
| ../test_wavs/1089-134686-0001.wav |
| ../test_wavs/1221-135766-0001.wav |
| ../test_wavs/1221-135766-0002.wav |
| EOF |
|
|
| Now we can calculate the scales needed for quantization with the calibration data: |
|
|
| .. code-block:: bash |
|
|
| cd egs/librispeech/ASR |
| cd icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/ |
|
|
| generate-int8-scale-table \ |
| ./encoder_jit_trace-pnnx.ncnn.param \ |
| ./encoder_jit_trace-pnnx.ncnn.bin \ |
| ./decoder_jit_trace-pnnx.ncnn.param \ |
| ./decoder_jit_trace-pnnx.ncnn.bin \ |
| ./joiner_jit_trace-pnnx.ncnn.param \ |
| ./joiner_jit_trace-pnnx.ncnn.bin \ |
| ./encoder-scale-table.txt \ |
| ./joiner-scale-table.txt \ |
| ./wave_filenames.txt |
|
|
| The output logs are in the following: |
|
|
| .. literalinclude:: ./code/generate-int-8-scale-table-for-lstm.txt |
|
|
| It generates the following two files: |
|
|
| .. code-block:: bash |
|
|
| ls -lh encoder-scale-table.txt joiner-scale-table.txt |
|
|
| -rw-r--r-- 1 kuangfangjun root 345K Feb 17 12:13 encoder-scale-table.txt |
| -rw-r--r-- 1 kuangfangjun root 17K Feb 17 12:13 joiner-scale-table.txt |
|
|
| .. caution:: |
|
|
| Definitely, you need more calibration data to compute the scale table. |
|
|
| Finally, let us use the scale table to quantize our models into ``int8``. |
|
|
| .. code-block:: bash |
|
|
| ncnn2int8 |
|
|
| usage: ncnn2int8 [inparam] [inbin] [outparam] [outbin] [calibration table] |
|
|
| First, we quantize the encoder model: |
|
|
| .. code-block:: bash |
|
|
| cd egs/librispeech/ASR |
| cd icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/ |
|
|
| ncnn2int8 \ |
| ./encoder_jit_trace-pnnx.ncnn.param \ |
| ./encoder_jit_trace-pnnx.ncnn.bin \ |
| ./encoder_jit_trace-pnnx.ncnn.int8.param \ |
| ./encoder_jit_trace-pnnx.ncnn.int8.bin \ |
| ./encoder-scale-table.txt |
|
|
| Next, we quantize the joiner model: |
|
|
| .. code-block:: bash |
|
|
| ncnn2int8 \ |
| ./joiner_jit_trace-pnnx.ncnn.param \ |
| ./joiner_jit_trace-pnnx.ncnn.bin \ |
| ./joiner_jit_trace-pnnx.ncnn.int8.param \ |
| ./joiner_jit_trace-pnnx.ncnn.int8.bin \ |
| ./joiner-scale-table.txt |
|
|
| The above two commands generate the following 4 files: |
|
|
| .. code-block:: |
|
|
| -rw-r--r-- 1 kuangfangjun root 218M Feb 17 12:19 encoder_jit_trace-pnnx.ncnn.int8.bin |
| -rw-r--r-- 1 kuangfangjun root 21K Feb 17 12:19 encoder_jit_trace-pnnx.ncnn.int8.param |
| -rw-r--r-- 1 kuangfangjun root 774K Feb 17 12:19 joiner_jit_trace-pnnx.ncnn.int8.bin |
| -rw-r--r-- 1 kuangfangjun root 496 Feb 17 12:19 joiner_jit_trace-pnnx.ncnn.int8.param |
|
|
| Congratulations! You have successfully quantized your model from ``float32`` to ``int8``. |
|
|
| .. caution:: |
|
|
| ``ncnn.int8.param`` and ``ncnn.int8.bin`` must be used in pairs. |
|
|
| You can replace ``ncnn.param`` and ``ncnn.bin`` with ``ncnn.int8.param`` |
| and ``ncnn.int8.bin`` in `sherpa-ncnn`_ if you like. |
|
|
| For instance, to use only the ``int8`` encoder in ``sherpa-ncnn``, you can |
| replace the following invocation: |
|
|
| .. code-block:: |
| |
| cd egs/librispeech/ASR |
| cd icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/ |
|
|
| sherpa-ncnn \ |
| ../data/lang_bpe_500/tokens.txt \ |
| ./encoder_jit_trace-pnnx.ncnn.param \ |
| ./encoder_jit_trace-pnnx.ncnn.bin \ |
| ./decoder_jit_trace-pnnx.ncnn.param \ |
| ./decoder_jit_trace-pnnx.ncnn.bin \ |
| ./joiner_jit_trace-pnnx.ncnn.param \ |
| ./joiner_jit_trace-pnnx.ncnn.bin \ |
| ../test_wavs/1089-134686-0001.wav |
|
|
| with |
|
|
| .. code-block:: bash |
|
|
| cd egs/librispeech/ASR |
| cd icefall-asr-librispeech-conv-emformer-transducer-stateless2-2022-07-05/exp/ |
|
|
| sherpa-ncnn \ |
| ../data/lang_bpe_500/tokens.txt \ |
| ./encoder_jit_trace-pnnx.ncnn.int8.param \ |
| ./encoder_jit_trace-pnnx.ncnn.int8.bin \ |
| ./decoder_jit_trace-pnnx.ncnn.param \ |
| ./decoder_jit_trace-pnnx.ncnn.bin \ |
| ./joiner_jit_trace-pnnx.ncnn.param \ |
| ./joiner_jit_trace-pnnx.ncnn.bin \ |
| ../test_wavs/1089-134686-0001.wav |
|
|
| The following table compares again the file sizes: |
|
|
| +----------------------------------------+------------+ |
| | File name | File size | |
| +----------------------------------------+------------+ |
| | encoder_jit_trace-pnnx.pt | 318 MB | |
| +----------------------------------------+------------+ |
| | decoder_jit_trace-pnnx.pt | 1010 KB | |
| +----------------------------------------+------------+ |
| | joiner_jit_trace-pnnx.pt | 3.0 MB | |
| +----------------------------------------+------------+ |
| | encoder_jit_trace-pnnx.ncnn.bin (fp16) | 159 MB | |
| +----------------------------------------+------------+ |
| | decoder_jit_trace-pnnx.ncnn.bin (fp16) | 503 KB | |
| +----------------------------------------+------------+ |
| | joiner_jit_trace-pnnx.ncnn.bin (fp16) | 1.5 MB | |
| +----------------------------------------+------------+ |
| | encoder_jit_trace-pnnx.ncnn.bin (fp32) | 317 MB | |
| +----------------------------------------+------------+ |
| | joiner_jit_trace-pnnx.ncnn.bin (fp32) | 3.0 MB | |
| +----------------------------------------+------------+ |
| | encoder_jit_trace-pnnx.ncnn.int8.bin | 218 MB | |
| +----------------------------------------+------------+ |
| | joiner_jit_trace-pnnx.ncnn.int8.bin | 774 KB | |
| +----------------------------------------+------------+ |
|
|
| You can see that the file size of the joiner model after ``int8`` quantization |
| is much smaller. However, the size of the encoder model is even larger than |
| the ``fp16`` counterpart. The reason is that `ncnn`_ currently does not support |
| quantizing ``LSTM`` layers into ``8-bit``. Please see |
| `<https://github.com/Tencent/ncnn/issues/4532>`_ |
|
|
| .. hint:: |
|
|
| Currently, only linear layers and convolutional layers are quantized |
| with ``int8``, so you don't see an exact ``4x`` reduction in file sizes. |
|
|
| .. note:: |
|
|
| You need to test the recognition accuracy after ``int8`` quantization. |
|
|
|
|
| That's it! Have fun with `sherpa-ncnn`_! |
|
|