<aside>
<img src="/icons/table_gray.svg" alt="/icons/table_gray.svg" width="40px" />
Table of Content
</aside>
Lecture 1: Quantize & De-Quantize a Tensor
Neural Network Quantization:
You can quantize:
- The Weights: NN Parameters
- The Activations
<aside>
<img src="/icons/info-alternate_red.svg" alt="/icons/info-alternate_red.svg" width="40px" />
Quantize the NN after being trained, called Post Training Quantization (PTQ)
</aside>
Advantages of Quantization:
- Smaller Models
- Speed Gain:
- Memory Bandwidth
- Faster Operations:
- GEMM: General Matrix Multiply
- GEMV: General Matrix Vector Multiply
Challenge of Quantization:
- Quantization Errors (the difference between quantized and de-quantized results)
- Retraining
- Limited Hardware Support
- Calibration Dataset needed
- Packing/Unpacking