Tags:Action Recognition, Hierarchical Bidirectional Long Short Term Memory Network and Part Based Fusion
Abstract:
The human body can be represented as an articulation of rigid and hinged joints which can be combined to form the parts of the body. Human actions can be thought of as a collective action of these parts. Hence, learning an effective spatio-temporal representation of the collective motion of these parts is key to action recognition. In this work, we propose an end-to-end pipeline for the task of human action recognition on video sequences using 2D joint trajectories estimated from a pose estimation framework. We use a Hierarchical Bidirectional Long Short Term Memory Network (HBLSTM) to model the spatio-temporal dependencies of the motion by fusing the pose based joint trajectories in a part based hierarchical fashion. To denote the effectiveness of our proposed approach, we compare its performance with six comparative architectures based on our model and also with several other methods on the widely used KTH and Weizmann action recognition datasets. Experimental results demonstrate that our proposed method outperforms the existing state of the art.
An Approach Towards Action Recognition Using Part Based Hierarchical Fusion