Face Detection & Recognition

As one of the most successful areas in computer vision, the detection and recognition of computer faces has produced a growing amount of attention.

This project won the Distinguished Project Award of VISL 2005

Abstract

As one of the most successful areas in computer vision, the detection and recognition of computer faces has produced a growing amount of attention. This can be viewed by the large number of conventions, performance-evaluation databases and commercial systems published during the last several years. This growing amount of attention can be attributed to two central factors. First, the increasing demand for security and tracking systems, civilian and military alike. Second, after 30 years of ongoing research, we now have feasible technologies for real world problems. While for humans visual object detection and recognition are seamless, creating an automated system which imitates this ability is still greatly an unsolved problem and an open research area.

In order to recognize faces in a complex scene, one has to first isolate (detect) the face from the rest of the scene. From the isolated face image one then has to extract facial features such as the eyes or mouth and try to couple them with a known individual’s facial features, conversely declaring the detection a false alarm. The detection process is to be invariant to face position and pose in the scene, to lighting conditions, various noise sources, facial expressions, head and facial hair, glasses, and so fourth.

The project will show that the novel algorithm by Viola and Jones, and the improvements made by Linheart et. al. provide a viable solution to the detection problem, for both faces in a scene and eyes within a face image. This solution achieves not only results comparable to those achieved by the best known systems, but in terms of framerate and false-alarm rates it surpasses those systems by orders of magnitude.

A complete system using a Viola & Jones face detector, two Viola & Jones eye detectors, an SVM base eye detector, and a multiclass face recognize also based on SVM are implemented and combined into a complete system providing a working access control mechanism for the VISL door implemented on very rudimentary (and cheap!) components and giving impressive timing and accuracy.

The Problem

Implementing a system which enables or restricts access to the VISL lab, using automatic computer based recognition.Performance requirements of the face detection stage were a high detection rate and a low false alarm rate, while maintaining high scanning rate during face detection. In addition a low error rate was required during the face recognition stage.

The system has to deal with varying face size and location in the frame, changing facial expressions, and a non constant lighting. The system was built on a regular PC, using a simple video camera.

The difficulty in the face detection process stems mainly from a large number of potential face candidates (about 2×10^5) per frame. The classification of each candidate is not negligible, and in order to reach desired detection and false alarm rates, an even more complex preprocess is needed. Even after sampling the picture and using motion detection, we are left with a large number of about 5000 candidates.

Previous systems used classifiers of constant input size, in order to scan the frame using all locations. In order to find faces at different distances from the camera, smaller versions of the picture were created. These versions were also scanned by the classifier. This process requires a lot of time and resources. In fact, it is the bottleneck of the scanning process, preventing it from working in real time. We call his problem “The Pyramid Problem”.

In order to overcome these difficulties, we will be using a novel approach developed by Viola and Jones, and augmented by Linheart et. al. For the remaining segments, an SVM implementation shall be given.

The solution

The systems is initially given a full-scene frame capture, showing the hall outside the door on the 6’th floor of Meyer bld. A tracking algorithm using alpha filtering (infinite diminishing memory) tracks the low frequency changing background during online operation. This compensates for backlight changes, night to day and other slow changes taking place.

By subtracting the tracked background from the current frame the areas in the frame in which motion occurs are isolated – these areas will serve as candidates for the detection process. Thanks to video noise and inaccuracies in the image acquisition process, the remaining objects are perforated and in continuous, so morphological image processing is executed in order to close the gaps and smoothen edges.

Statistical models for choosing only the connected components whose dimensions imply that they may actually contain a face are use next. The output from which is inputted to the Viola & Jones face detector.

The detector is actually a cascade of AdaBoost classifiers each filtering out at least 50% of the non-face candidates and passing on more than 99% of the true face candidates. The final cascade output is a face detection.

Since we scan on a dense grid, a true face in the image will be detected multiple times with small translations and scales. A DFS based algorithm is thus used to fuse these multiple detections into a single detection. These detections are inputted to the eye detectors.

We implemented both an adapted version of Viola & Jones’s algorithm for detecting eyes and an SVM based system for the same purpose. These systems again produce multiple detections surrounding each true eye location, those are again fused by the DFS based algorithm.

Using the eye locations found – the face image is cropped and rotated to support more accurate recognition in the final stage.

The recognition system first preprocesses the image with histogram equalization and oval masking to concentrate only on the relevant areas. It then uses an SVM multiclass classifier to produce the final recognition output and the access control signal to the door.

Tools
This project was developed in a Matlab 6.5/7.0, on a PC platform environment. The recognition system uses the OSUSVM toolbox. I mage input were filmed using the cheapest analog video camera available at the time, transferred to a computer using a Fly Video sampling card.

Conclusions

The project has shown that the novel algorithm by Viola and Jones, and the improvements made by Linheart et. al. provide a viable solution to the detection problem, for both faces in a scene and eyes within a face image. This solution achieves not only results comparable to those achieved by the best known systems, but in terms of framerate and false-alarm rates it surpasses those systems by orders of magnitude.

A complete system using a Viola & Jones face detector, two Viola & Jones eye detectors, an SVM base eye detector using OSU_SVM, and a multiclass face recognize also based on SVM are implemented and combined using Matlab and some C code for the core segments. The result is a complete system providing a working access control mechanism for the VISL door which uses only very rudimentary (and cheap!) components: a basic analog video camera comparable to today’s webcams, a FlyVideo tv/capture card, and standard PC (1.7GHz P4, 0.5GB RAM). This system gives impressive framerates and accuracy results which are more than sufficient to serve as an identification system.

Acknowledgment

We would like to thank the following people:
Dori Peleg, our supervisor, for his helpful suggestions and for his theoretical explanations.
The VISL lab staff: Aharon, Ina, and the lab engineer Johanan, for their willingness to assist with every problem we encountered during the development of this project, and for supplying the required equipment and technical support.
We are also grateful to the Ollendorf Minerva Center, for supporting the VISL lab, and its research projects.