Action Recognition using Deep Convolutional Neural Networks and Compressed Spatio-Temporal Pose Encodings

William McNally; Alexander Wong; John McPhee

Vol. 4 No. 1 (2018)
Special Issue: Proceedings of CVIS 2018

Articles

Action Recognition using Deep Convolutional Neural Networks and Compressed Spatio-Temporal Pose Encodings

Published 2018-12-24

William McNally
Alexander Wong
John McPhee

How to Cite

McNally, W., Wong, A., & McPhee, J. (2018). Action Recognition using Deep Convolutional Neural Networks and Compressed Spatio-Temporal Pose Encodings. Journal of Computational Vision and Imaging Systems, 4(1), 3. Retrieved from https://openjournals.uwaterloo.ca/index.php/vsl/article/view/339

Download Citation

Abstract

Convolutional neural networks have recently shown proficiency at
recognizing actions in RGB video. Existing models are gener-
ally very deep, requiring large amounts of data to train effectively.
Moreover, they rely mainly on global appearance and could poten-
tially underperform in single-environment applications, such as a
sports event. To overcome these limitations, we propose to short-
cut spatial learning by leveraging the activations within a human
pose estimation network. The proposed framework integrates a
human pose estimation network with a convolutional classifier via
compressed encodings of pose activations. When evaluated on
UTD-MHAD, a 27-class multimodal dataset, the pose-based RGB
action recognition model achieves a classification accuracy of 98.4%
in a subject-specific experiment and outperforms a baseline method
that fuses depth and inertial sensor data.

pdf