Skip to main navigation menu Skip to main content Skip to site footer

Understanding vision transformer quantization robustness through the lens of out-of-distribution detection

Abstract

Vision transformers have shown remarkable performance in computer vision tasks. Enabling powerful models for accessible, real-time use likely requires quantization to compress the model, opening the risk for loss in performance. Works typically seek to understand model behaviour at lower precision with respect to classification, but the attention mechanism leads us to believe that we can gain insight by including behaviour in out-of-distribution (OOD) situations. We investigate the behaviour of quantized small-variant popular vision transformers (DeiT, DeiT3, and ViT) using common OOD datasets such as OpenImage-O and iNaturalist. In-distribution (ID) analyses show the initial instabilities of 4-bit models, particularly of those trained on the larger ImageNet-22k, as the strongest FP32 model in our experiment, DeiT3, sharply drops 17% from 4-bit quantization error to be one of the weakest 4-bit models. While ViT shows reasonable quantization robustness for ID calibration, OOD detection reveals more: ViT and DeiT3 pretrained on ImageNet-22k respectively experienced a 15.0% and 19.2% average quantization delta in AUPR-out between full precision to 4-bit while the same models trained only on ImageNet-1k experienced a 9.5% and 12.0% delta. Overall, our results suggest pretraining on large scale datasets may hinder low-bit quantization robustness in OOD detection and that data augmentation may be a more beneficial option.
PDF