ZoomGate: Scale-Aware Action Recognition Across Mixed Zoom Levels

Kseniia Buzko; David Clausi; John Zelek; Yuhao Chen

doi:10.15353/jcvis.v11i1.10006

Vol. 11 No. 1 (2025)
Special Issue: Proceedings of CVIS 2025

Articles

ZoomGate: Scale-Aware Action Recognition Across Mixed Zoom Levels

https://doi.org/10.15353/jcvis.v11i1.10006

Published 2026-04-01

Kseniia Buzko
David Clausi
John Zelek
Yuhao Chen

How to Cite

Buzko, K., Clausi, D., Zelek, J., & Chen, Y. (2026). ZoomGate: Scale-Aware Action Recognition Across Mixed Zoom Levels. Journal of Computational Vision and Imaging Systems, 11(1), 33–37. https://doi.org/10.15353/jcvis.v11i1.10006

Download Citation

Abstract

Human action recognition and facial-expression analysis in cinematic footage remain challenging because most systems ignore camera-dependent changes in visible detail. Close-ups, medium shots, and long shots contain fundamentally different expressive cues, yet most models treat scale as irrelevant. This paper addresses this gap by introducing ZoomGate, a unified, scale-aware pipeline for human behaviour understanding across mixed zoom levels. Using a movie-trailer dataset with frame-level zoom annotations, we train image backbones to classify camera scale and route video segments to view-specific recognition modules tailored for facial emotions, micro-gestures, upper-body actions, full-body motion, or hand-only gestures. For close-up segments, a multimodal Gemini-based analysis produces structured, temporally aligned descriptions of emotion dynamics and articulatory behaviour. Experiments demonstrate that scale-conditioned processing yields more coherent and interpretable predictions than scale-agnostic baselines. ZoomGate provides a principled foundation for building computer-vision systems and AI characters that adjust behaviour naturally with camera distance.

PDF