Skip to main navigation menu Skip to main content Skip to site footer

Vision Transformers for Age Prediction from Gait Energy Image Data


Gait age estimation aims to predict a person’s age using visual surveillance information. One popular approach involves using Gait Energy Images (GEIs), which capture the essence of an individual’s gait for analysis. Nonetheless, training a model from scratch demands considerable computational resources and extensive data. In contrast to the traditional approaches, we utilized pre-trained vision transformer (ViT) models to enhance the performance. We froze the backbone of the pre-trained transformers and assessed their capabilities in zero-shot tasks by training regression heads on a compact dataset. Our approach yielded an optimal model with the best Mean Average Error (MAE) of 10. The findings suggest that the advanced ViT models can effectively carry out zero-shot predictions in gait recognition tasks while maintaining low computational demands and utilizing minimal datasets. We expect that the research findings will provide insight into vision transformer-based gait recognition for future research and applications.