Statistical Motion Atlas

Currently, security professionals go through a time-consuming human review of recorded video in order to reconstruct events from these video. Video Summary reduces the time required to analyze the same archived video by removing unnecessary video data. In most scenes, a majority of the archived video contains just a static background without any people, vehicles, or objects of interest. Video Summary quickly analyzes the video and removes the static background data while retaining the relevant data. This provides an advantage to security professionals by allowing them to quickly review hours of archived video in minutes.

The current state of the art is to use the motion descriptors such as optic flow in order to distinguish the frames where motions of car or human has occurred. Recently, the filtered images using such descriptors has been used in order to build deep learning networks for certain motion classification networks [1], [3]. These classes of solutions are mostly suitable for the videos where the motion energy of the background and foreground is discriminable [2]. Besides for proper network training, sufficient amount of samples for the same objects variations is needed. This is not desirable in real world applications since it excludes detecting unpredictable occurrences in the videos.

The purpose of this project is to explore the usefulness of visual analytics for creating a fast motion summary framework which archive videos of the same scene into a manageable and compacted motion atlas. Creating the motion atlas, first will help us to have an understanding of the nature of motion variation in the video. Second, it can be used to fetch the time intervals where no (meaningful) motion has been occurred, therefore those can be neglected. Thirdly, it opens a possibility of learning the scene classification algorithms to classify meaningful motions independent of any presumptions.


In order to build the motion atlas the spectral based descriptors will be used [4]. So far these type of descriptors has only being used for three dimensional volume images and not three dimensional image stacks. The first step is to develop a heuristic way to translate a spectral based descriptor for video volumes. The second step is to incorporate the motion occurred in different time intervals within the corresponding locations, in order to build co-occurrence statistics of motions for those regions and for later classification purposes.


Interested candidates should follow a Master’s study in preferably electrical engineering, computer science, mathematics, or a related field. Good knowledge of C++ and basic knowledge of image processing and OpenCV are required. The project will be in collaboration with Siqura, surveillance security solution inc., Gouda, The Netherlands. It is also worth to mention that there will be an assigned monthly allowance for the selected candidate.


[1] Joe Yue-Hei Ng, M. Hausknecht, S. Vijayanarasimhan, O. Vinyals, R. Monga and G. Toderici, ”Beyond short snippets: Deep networks for video classification,” 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, 2015, pp. 4694-4702.
[2] A. karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthunkar and L. Fei-Fei. Large Scale Video Classification with Convolutional Neural Netwroks. In proc. CVPR, Pages 1725-1732, Columbus, Ohio, USA, 2014.
[3] Weinzaepfel, Philippe and Revaud, Jerome and Harchaoui, Zaid and Schmid, Cordelia, Learning to Detect Motion Boundaries,CVPR 2015 - IEEE Conference on Computer Vision & Pattern Recognition,Boston, United States,2015
[4] M. Aubry, U. Schlickewei and D. Cremers, ”The wave kernel signature: A quantum mechanical approach to shape analysis,” 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), Barcelona, 2011, pp. 1626-1633.


Type master project
Place external
Supervisors Neda Sepasian, Huub van de Wetering
date 05/2017