Automatic Detection and Tracking of Soccer Players

This project is one chapter in the series of projects dealing with automatic offside recognition by means of image processing.

Abstract
This project is one chapter in the series of projects dealing with automatic offside recognition by means of image processing. Prior projects have implemented mapping of field coordinates, tracking of the ball and identification of soccer players. This project is aimed at creating a system that will detect and track soccer players in a given soccer game video, so that the location of all players is known at all times. The basic idea behind it is the conception of a movie as a sequence of consecutive frames, each adding some information to its predecessor. Thus, by administrating a database that is updated every frame, we can create a robust system that performs reliable identification and tracking. The goal is to know the locations of all players at any given time, in order to be able to decide if a player is offside.

The problem
The input to work on is a soccer movie, but until now, image processing was done on each frame separately. This method has a few drawbacks:

Occlusions (which are likely to happen) make it impossible to certainly detect all the players in a frame. In fact, every detected object cannot be distinguished between one and many players
Results such as the number of players, or the color of the player’s shirt, may be inconsistent from one frame to the next
Analysis time is too long for a system expected to work in real time

All the above indicate that a different approach is required.

The solution
In order to cope with occlusions, instead of treating all objects in a frame as players, three objects were defined:

Player: a single player in full certainty
Occluded: some players, that cannot be separated by means of image processing, and whose number is known for certain
Group: some players, that cannot be separated by means of image processing, and whose number is not certain, but only the minimum and maximum numbers possible

These three defined objects cover all the possible states an object can be in. The transition from one state to another is as follows:

The system performs data collection about all objects along the time axis. This data includes the color of a player’s shirt, the number of players in an Occluded object, and the possible numbers of players in a group object.

The processing of each of the frames starts with the reduction of the background picture, leaving only areas with temporal differences between two consecutive frames. These areas in the binary image are the only areas of interest.

Then, each of the objects currently known is located in the current frame. The correlation between each object and its current location is done by a maximum likelihood criterion: there is always partial coverage between two appearances of the same object in consequent frames. From it the location and direction of motion can be concluded. The new location is updated in the database, to be used in the next frame. Merges and splits of objects are discovered in the same way, resulting in a change in the object’s state.

In addition to tracking of known objects, it is necessary to allow entry and exit of objects into the frame and out of it. A detection strip is defined at the edge of the frame, from which these actions may take place, and each new object detected is defined to be Group object, as its number of players is unknown. The image processing is performed only in the areas of objects and the detection strip, reducing the processing time significantly.

Identification of the shirts’ color is also done each frame, and results are collected along the sequence of frames. But to avoid inconsistencies, the color of the shirt of a player is determined only after it was verified for a large enough sample of frames. In Occluded objects, the shirts’ colors are used in order to identify players that are unseparatable by image processing.

As for all the above, the location and associated team are known at all times, and it can be determined if a player is offside.

Results
These are the results of execution over 4 movies with 700-1100 frames each.

The percentage of success in tracking is defined as the number of times each object is successfully tracked out of the number of occurrences of this object in the entire movie. The average success percentage is ~97%, i.e. erroneous tracking of an object occurs once in ~40 frames. Most of These errors happen because the object in the binary image is splitting.

The division of objects shows that 24% of the objects are group objects. Such object cannot participate in the offside decision, because of the uncertainty they represent.

Conclusions
A system that uses image processing for detecting and tracking football players in a movie was fully implemented and shows good results. It manages to cope with challenges, such as occlusion, shading, and changes in lighting. Offside situation is identified. Still, the level of uncertainty is too high to enable correct operation in all situations. Suggestion in order to take system performance to a greater extent:

Use of multiple cameras for cross-tracking
Observation mechanism to discover impossible-to-track situations
Integration with football field coordinate mapping

Acknowledgment
We are grateful to our project supervisor Guy Gilboa and the Vision and Image Science Lab staff for their support and guidance.
We are also grateful to the Ollendorff Minerva Center Fund for supporting this project