<aside> <img src="/icons/table_gray.svg" alt="/icons/table_gray.svg" width="40px" />
</aside>
Quantization is shrinking models to small size, so that any one can run it on their own computer with no performance degradation.
Remove connections/nodes/weights that are not important for the model.
Train Smaller Model (student) using the original model (teacher)
Idea: Store the parameters of the model in lower precision (for example, from fp32
→ int8
)