Producing 2D Canonical Images Using Depth Images

There is a growing interest around the world today in surveillance and detection of people, especially in the face recognition department.

Abstract
There is a growing interest around the world today in surveillance and detection of people, especially in the face recognition department. The complexity of the human face makes face recognition in real time a challenging problem. This problem gets even more complex if we consider the wide variety of lighting and angles that a person’s picture can be taken at. One of the most common face recognition techniques is the EigenFace Algorithm.
This face recognition algorithm is mostly known for its computational efficiency. However, it has a major drawback. The algorithm ‘learned’ to recognize faces using a training set of images that were all taken in almost identical conditions. If the input face image is slightly different in its pose, distance or lighting condition, the EigenFaces algorithm will suffer from a dramatic degradation in its recognition quality.
In this project we will use the information given by a 3D camera in order to overcome the above problem. We will automatically ‘normalize’ the image and adjust it to the face recognition requirements.

Background: The Eigenfaces approach to face recognition
This approach transforms face images into a small set of characteristic feature images, called “eigenfaces”, which are the principal components of an initial training set of face images. We can now define the “face space” as the sub-space spanned by those eigenfaces. Face recognition is performed by projecting a new image onto the face-space and then classifying the face by comparing its position in face-space with the position of known individuals.
In mathematical terms, we present each eigenface as a column-stack vector and use those vectors as the columns of the eigenfaces matrix . A new face image is transformed into its eigenface components by the operation , where is the average face constructed from the training set.

Figure 1 – For simplicity, let the 3D space represent the whole image space, then the face sub space is a 2D space spanned by the eigenfaces

The problem
As mentioned before, face recognition using eigenfaces takes advantage of the fact that faces are normally upright and thus may be described by a small set of 2D characteristic views(the eigenfaces). A problem appears if the photo of the face being recognized was not taken in a straight upright position. Such cases can be seen in a surveillance camera located at an upper corner of a room, taking pictures of people looking at different directions under different lighting conditions.

The solution
In this project, an automatic 3D face normalization was implemented. We use the 3D information provided by a 3D camera to normalize the face in a manner that will fit best to a specific face recognition mechanism.
The main idea is that any image can be projected to the face-space. An image of a person looking forward will be closer to its projection than an image of a person looking sideways. Therefore we have to find an algorithm that will minimize the distance between the image and its projection.

Figure 2 – A normalized image of a person looking straight forward will be closer to its projection on the face-space than an un-normalized image

Let be the original image and be its projection. Our objective will be to minimize , where represents the projection error. We can use the 3D information in the original image to rotate the face (and get a new image, ). Rotating in the right direction will reduce .

Were A is a rotation and scaling matrix, and t is a translation vector. Now . Comparing the derivation of the last equation to zero will give us the correct rotation, scaling and translation.

Results and conclusions
This project showed that the suggested algorithm can achieve automatic normalization and improve face recognition. We have created a system that allows any user, through a user friendly interface, to load a 3D image, select the face area, and run the Normalizing Algorithm.

The above clips are outputs of our system. The first frame of each clip is the original image. All other frames are images we produce using the algorithm.

Acknowledgment
We would like the thank the Vision and Image Sciences Laboratory staff for their support, and express our gratitude to our supervisor, Tomer Michaeli.
We would also like to thank the Ollendorff Minerva Center Fund for supporting this project.