Eye Tracking And LOS Recognition Using Standard Webcam

This project is concerned with finding subject's LOS (Line Of Sight) using real time analysis with standard home equipment and using it to control a common media player (WMP).

Abstract

This project is concerned with finding subject’s LOS (Line Of Sight) using real time analysis with standard home equipment and using it to control a common media player (WMP). The system is expected to be stable and compatible to variety of people, with minimal impact from environment variables, run on everyday computers and have little error margin. In this project we reviewed two methods of implementation – Using Hough transform and using template matching techniques.

The problem
The problem consisted of determining subject’s LOS from a frame captured from standard camcorder, overcoming environment noises and people’s varieties
in skin color, eye color, and face outlines. After processing the given image a translation was needed from the recognized LOS to screen coordinates and determining relevant action according to coordinated determined. In addition to that there were some technical difficulties to overcome such as constraints on the camcorder’s position, the frames’ quality, and maximal sampling rate. Another main issue we needed to attend was maintaining the system’s overhead low with minimal impact on performance.

The solution
As mentioned above, two different algorithms were reviewed as possible solutions. The first one uses Hough transform to recognize the user’s pupil. After finding the pupil a translation of it’s location to screen coordinated was made. This method was abandoned in a later stage of the project. The method’s base algorithm leaned on previous project who used it with infra red camera. we learned that the nature of the algorithm didn’t suit a standard webcam and due to constraints of time we decided to use the second, more mature method. The second method is based on template matching technique, using image processing capabilities to minimize environment noises. This method was
chosen for the implementation of the algorithem. The Algorithm uses “past results” to reduce processing time, and determined user’s current location based on previous locations (relative movement).

Template Selection & Calibration
Subject must look at the center of the screen for calibration purpose.
User is prompted to hand-select the template of the iris. The center of the template is set as “Ground Zero”. Around the center point a “map” is defined, the size of the map reflects screen size. This map is used for translating the LOS to screen coordinates.

Pre Processing & Template Matching
Gaussian filter is being used to reduce noises and to increase system stability. Search window is defined around “ground zero”, the template will be searched only in the search window to enable real time processing. The template is searched based on the SQDIFF NORMED Equation (Figures 3-4), and a given threshold. When found, a new “ground zero” is defined and system continues.When not found, system retries in the following frame, to overcome cases such as blinking, ect.

Determining Screen Coordinates
Based on the predefined map, screen coordinated are calculated using the main center point determined at calibration. Location is relative to that point and some corrections are used. Based on the location found, a search is made to check if the location is inside one of four “Hot Zones”.These zones were predefined and are static. They change only relatively to screen resolution. If found in a hot zone a relevant action is made.

Conclusions
The project demonstrates good tracking under high resolution in real time environment, supplying a product working on standard computers with webcams and enabling full control over standard media player.