Human Action Recognition and Localization using Spatio-temporal Descriptors and Tracking

Ballan, L.; Bertini, M.; Del Bimbo, A.; Seidenari, L.; Serra, Giuseppe

In this paper we propose a system for human action tracking and recognition using a robust particle filter-based visual tracker and a novel descriptor, to represent spatio-temporal interest points, based on an effective combination of a new 3D gradient descriptor with an optic flow descriptor. These points are used to represent video sequences using a bag of spatio-temporal visual words, following the successful results achieved in object and scene classification. The tracker assigns the points to each individual in a scene, allowing the classification of the action performed by each person. The system has been extensively tested on the standard KTH and Weizmann actions datasets, as well as on real world surveillance videos.