dann-od commited on
Commit
30d278b
·
verified ·
1 Parent(s): ed1810b

Add QAT note

Browse files
Files changed (1) hide show
  1. README.md +3 -0
README.md CHANGED
@@ -49,6 +49,9 @@ for low-latency NVIDIA TensorRT inference on edge GPUs.
49
  semantics.
50
  - **Validated accuracy** within 3.30 pp of the FP32
51
  baseline on ImageNet (see Accuracy table below).
 
 
 
52
  - **Matches the latency of `trtexec --best`** on supported NVIDIA
53
  hardware while preserving INT8 accuracy (see Performance table
54
  below).
 
49
  semantics.
50
  - **Validated accuracy** within 3.30 pp of the FP32
51
  baseline on ImageNet (see Accuracy table below).
52
+ - **Quantization-aware training (QAT)** further recovers accuracy
53
+ lost in INT8 conversion by fine-tuning the model with simulated
54
+ quantization in the forward pass.
55
  - **Matches the latency of `trtexec --best`** on supported NVIDIA
56
  hardware while preserving INT8 accuracy (see Performance table
57
  below).