Human Pose Estimation is a fundamental task in vision-driven hockey analytics which is highly useful for a variety of downstream tasks such as action recognition and player assessment. Estimating human pose keypoints from a monocular video is a challenging task, especially in agile environments. Fast-paced games such as Ice-hockey and Lacrosse often have large amounts of motion blur and occlusions in their video feed. However, most of the previous research works use high-resolution inputs curated in isolated environments. As a result, the existing benchmarks do not capture the model's performance in real-world agile settings. Hence, in this work, we evaluate several state-of-the-art (SOTA) 2d pose estimators on our custom Ice-hockey dataset created from broadcast hockey videos. We conduct extensive comparison studies on 4 SOTA pose estimators, both quantitatively and qualitatively, and empirically demonstrate that Multi Stage Pose Networks (MSPN) produces the best results on our dataset.
PDF