Action Recognition using Deep Convolutional Neural Networks and Compressed Spatio-Temporal Pose Encodings

  • William McNally
  • Alexander Wong
  • John McPhee

Abstract

Convolutional neural networks have recently shown proficiency at
recognizing actions in RGB video. Existing models are gener-
ally very deep, requiring large amounts of data to train effectively.
Moreover, they rely mainly on global appearance and could poten-
tially underperform in single-environment applications, such as a
sports event. To overcome these limitations, we propose to short-
cut spatial learning by leveraging the activations within a human
pose estimation network. The proposed framework integrates a
human pose estimation network with a convolutional classifier via
compressed encodings of pose activations. When evaluated on
UTD-MHAD, a 27-class multimodal dataset, the pose-based RGB
action recognition model achieves a classification accuracy of 98.4%
in a subject-specific experiment and outperforms a baseline method
that fuses depth and inertial sensor data.

Published
2018-12-24
How to Cite
McNally, W., Wong, A., & McPhee, J. (2018). Action Recognition using Deep Convolutional Neural Networks and Compressed Spatio-Temporal Pose Encodings. Journal of Computational Vision and Imaging Systems, 4(1), 3. Retrieved from https://openjournals.uwaterloo.ca/index.php/vsl/article/view/339