Model-based global and local motion estimation for videoconference sequences

Calvagno, G.; Fantozzi, F.; Rinaldo, Roberto; Viareggio, A.

doi:10.1109/TCSVT.2004.833167

In this work, we present an algorithm for face 3-D motion estimation in videoconference sequences. The algorithm is able to estimate both the position of the face as an object in 3-D space (global motion) and the movements of portions of the face, like the mouth or the eyebrows ( local motion). The algorithm uses a modified version of the standard 3-D face model CANDIDE. We present various techniques to increase robustness of the global motion estimation which is based on feature tracking and an extended Kalman filter. Global motion estimation is used as a starting point for local motion detection in the mouth and eyebrow areas. To this purpose, synthetic images of these areas (templates) are generated with texture mapping techniques, and then compared to the corresponding regions in the current frame. A set of parameters, called action unit vectors (AUVs) influences the shape of the synthetic mouth and eyebrows. The optimal AUV values are determined via a gradient-based minimization procedure of the error energy between the templates and the actual face areas. The proposed scheme is robust and was tested with success on sequences of many hundreds of frames.