Understanding vision transformer quantization robustness through the lens of out-of-distribution detection

Joey Kuang; Alexander Wong

doi:10.15353/jcvis.v11i1.10004

Vol. 11 No. 1 (2025)
Special Issue: Proceedings of CVIS 2025

Articles

Understanding vision transformer quantization robustness through the lens of out-of-distribution detection

https://doi.org/10.15353/jcvis.v11i1.10004

Published 2026-04-01

Joey Kuang
Alexander Wong

How to Cite

Kuang, J., & Wong, A. (2026). Understanding vision transformer quantization robustness through the lens of out-of-distribution detection. Journal of Computational Vision and Imaging Systems, 11(1), 22–27. https://doi.org/10.15353/jcvis.v11i1.10004

Download Citation

Abstract

Vision transformers have shown remarkable performance in computer vision tasks. Enabling powerful models for accessible, real-time use likely requires quantization to compress the model, opening the risk for loss in performance. Works typically seek to understand model behaviour at lower precision with respect to classification, but the attention mechanism leads us to believe that we can gain insight by including behaviour in out-of-distribution (OOD) situations. We investigate the behaviour of quantized small-variant popular vision transformers (DeiT, DeiT3, and ViT) using common OOD datasets such as OpenImage-O and iNaturalist. In-distribution (ID) analyses show the initial instabilities of 4-bit models, particularly of those trained on the larger ImageNet-22k, as the strongest FP32 model in our experiment, DeiT3, sharply drops 17% from 4-bit quantization error to be one of the weakest 4-bit models. While ViT shows reasonable quantization robustness for ID calibration, OOD detection reveals more: ViT and DeiT3 pretrained on ImageNet-22k respectively experienced a 15.0% and 19.2% average quantization delta in AUPR-out between full precision to 4-bit while the same models trained only on ImageNet-1k experienced a 9.5% and 12.0% delta. Overall, our results suggest pretraining on large scale datasets may hinder low-bit quantization robustness in OOD detection and that data augmentation may be a more beneficial option.

PDF