Add QAT note
Browse files
README.md
CHANGED
|
@@ -49,6 +49,9 @@ for low-latency NVIDIA TensorRT inference on edge GPUs.
|
|
| 49 |
semantics.
|
| 50 |
- **Validated accuracy** within 3.30 pp of the FP32
|
| 51 |
baseline on ImageNet (see Accuracy table below).
|
|
|
|
|
|
|
|
|
|
| 52 |
- **Matches the latency of `trtexec --best`** on supported NVIDIA
|
| 53 |
hardware while preserving INT8 accuracy (see Performance table
|
| 54 |
below).
|
|
|
|
| 49 |
semantics.
|
| 50 |
- **Validated accuracy** within 3.30 pp of the FP32
|
| 51 |
baseline on ImageNet (see Accuracy table below).
|
| 52 |
+
- **Quantization-aware training (QAT)** further recovers accuracy
|
| 53 |
+
lost in INT8 conversion by fine-tuning the model with simulated
|
| 54 |
+
quantization in the forward pass.
|
| 55 |
- **Matches the latency of `trtexec --best`** on supported NVIDIA
|
| 56 |
hardware while preserving INT8 accuracy (see Performance table
|
| 57 |
below).
|