Dynamic Headpose Classification and Video Retargeting with Human Attention
Over the years, extensive research has been devoted to the study of people's head pose due to its relevance in security, human-computer interaction, advertising as well as cognitive, neuro and behavioural psychology. One of the main goals of this thesis is to estimate people's 3D head orientation as they freely move around in naturalistic settings such as parties, supermarkets etc. Head pose classification from surveillance images acquired with distant, large field-of-view cameras is difficult as faces captured are at low-resolution with a blurred appearance. Also labelling sufficient training data for headpose estimation in such settings is difficult due to the motion of targets and the large possible range of head orientations. Domain adaptation approaches are useful for transferring knowledge from the training source to the test target data having different attributes, minimizing target data labelling efforts in the process. This thesis examines the use of transfer learning for efficient multi-view head pose classification. Relationship between head pose and facial appearance from many labelled examples corresponding to the source data is learned initially. Domain adaptation techniques are then employed to transfer this knowledge to the target data. The following three challenging situations is addressed (I) ranges of head poses in the source and target images is different, (II) where source images capture a stationary person while target images capture a moving person with varying facial appearance due to changing perspective, scale and (III) a combination of (I) and (II). All proposed transfer learning methods are sufficiently tested and benchmarked on a new compiled dataset DPOSE for headpose classification. This thesis also looks at a novel signature representation for describing object sets for covariance descriptors, Covariance Profiles (CPs). CP is well suited for representing a set of similarly related objects. CPs posit that the covariance matrices, pertaining to a specific entity, share the same eigen-structure. Such a representation is not only compact but also eliminates the need to store all the training data. Experiments on images as well as videos for applications such as object-track clustering and headpose estimation is shown using CP. In the second part, Human-gaze for interest point detection for video retargeting is explored. Regions in video streams attracting human interest contribute significantly to human understanding of the video. Being able to predict salient and informative Regions of Interest (ROIs) through a sequence of eye movements is a challenging problem. This thesis proposes an interactive human-in-loop framework to model eye-movements and predicts visual saliency in yet-unseen frames. Eye-tracking and video content is used to model visual attention in a manner that accounts for temporal discontinuities due to sudden eye movements, noise and behavioural artefacts. Gaze buffering, for eye-gaze analysis and its fusion with content based features is proposed. The method uses eye-gaze information along with bottom-up and top-down saliency to boost the importance of image pixels. Our robust visual saliency prediction is instantiated for content aware Video Retargeting.