Motion Estimation using the Block-Matching Algorithm

Motion estimation is an important subject in the field of computer vision used by many systems for various purposes, like warning and alarm systems, video compression, speed measurement and so on.

Abstract
Motion estimation is an important subject in the field of computer vision used by many systems for various purposes, like warning and alarm systems, video compression, speed measurement and so on. This project realizes a system that identifies a moving car object in a sequence of video frames and calculates its speed and acceleration.

Project Goal
The Project objective is to measure speed and acceleration of cars in a sequence of video frames taken by a standard home video camera. Since one of the important applications is determining whether or not the car is moving faster then the permissible speed, the measurement should be accurate and reliable as possible and these features should be on first priority. Also, the system should function in a wide range of operating conditions like various sizes/colors/speeds of cars, various luminance levels and noises. Finally, the hardware should be common – home video camera and a desktop computer.

The Solution
Visual motion isn’t based on recognition of texture and color but on changes of luminance and color vs. time. Analysis of visual motion mostly consists of two stages: getting the movement information (direction, speed, displacement) from a series of pictures and processing this information.
One way to get the movement information is motion estimation – to find two-dimensional velocity vector for every small area in the picture, assuming adjacent pixels have the same speed.
The relation between two consecutive pictures X[k,j,t] and X[k,j,t-1] in a video movie where X[k,j,t] is the pixel at coordinates k,j in time t can be modeled as:

X[k,j,t] = X[k-dx(k,j,t),j-dy(k,j,t),t-1]

i.e. every pixel in the present picture originates from a pixel in the previous one. The pixels are the same, only changed positions. If we find where a pixel went and we know the time elapsed between the two pictures, we have the velocity.
Finding displacement for each pixel is problematic. The number of unknowns is enormous, twice the number of pixels, what means a lot of computations and required memory. The main problem is that a single pixel can be found in the previous picture in more than one place, and noise can influence extremely on the match’s quality.
How do we choose the right original place ? A natural solution is to assume homogeneity for the displacement field – adjacent pixels are moving together, logical assumption in case of rigid bodies. The picture is divided to blocks, for example 8×8 pixels and we have to find the origin of that block. Now the match is a lot more immune to noises and false origins, and we have to calculate and remember far less vectors. If the block contains two pixels groups, each is moving differently, we won’t find a match. This imposes a maximum on block size, so the object we trace will contain at least one whole block.
This method is called ‘block matching’. Its main disadvantage is the large number of computations, comparing to other algorithms. However none of the alternatives was accurate and reliable enough or hasn’t allowed isolate the main moving object and since we don’t need real-time performance, we chose block matching.

The Algorithm
1. Sampling video movie from a video camera to AVI video file format.
2. Pictures sequence – the program reads the movie to a sequence of gray-scale pictures, matrices of MxN size.
3. Mask generation – for each pair of consequence pictures, the pictures are divided to BxB (B = block size) blocks and the program calculates the correlation between each block in one picture and the block in the same position in the second picture. Blocks with low correlation means the block moved and its new location should be found. The result is a [M/B]x[N/B] matrix called the “mask” defining the area where a change occurred. Its purpose is to filter out the static blocks and limit the search to where the action really occurred.
4. Block matching – for each pair of consequence pictures, each block marked by the mask is searched for in the second picture in the area marked by the mask, with one-pixel steps. A block is considered ‘found’ if it matches a block in the second picture with a correlation higher than a threshold T. In case of more than one much, we choose by the highest correlation.

Now we calculate the average speed for each picture pair by screening the blocks with low correlation match and those which speed diverges too much. Such blocks are caused by other moving objects, noises and blocks that not all their pixels belong to the moving object.
5. Adapt by correlation – Each frame pair has a speed now. If more than a certain percentage of the pair-speed calculations failed, we repeat stage 4 with a lower T. High T is good for daylight pictures and gives high accuracy but for darkness, for example, it should be lower otherwise there aren’t matches at all.
6. Calculate velocity and/or acceleration – the pair’s speed vector is matched with 2-coefficients polynomial to get speed and acceleration.
7. Units’ translation – the speed/acceleration is converted to real-world units based on the dimensions of the captured area, camera’s location with regard to the road and the time elapsing between each pictures pair. The program gives also the standard deviation and success percentage.

Tools
Video shots were taken by a regular home video camera and sampled by a desktop PC equipped with Miro DC30 video capture card. The algorithm was implemented on MATLAB V6 and C.

Conclusions
The algorithm performed well on a wide range of speeds, cars’ colors and sizes and stood difficulties like other (small) moving objects, unfocused camera, dazzling and artificial noise. The standard deviation was mostly under 2%. The exception was black cars in darkness, where the algorithm failed, but reported the failure and hasn’t returned false results. Computation cost turned out to be a problem; we had to convert part of the MATLAB code to C to achieve usable performance.
Block Matching proved to be suitable for accurate and reliable motion estimation.
If we settle for the speed only, without the acceleration, computation time can be reduced significantly by taking only 2-3 pictures, or lowering the sample rate of the video card. Another advantage of the latter is using simpler and cheaper card.
Lens distortion and other limitations of the camera found insignificant enough, thus video camera is reliable as a speed measurement device.

Acknowledgments
We would like to thank our supervisor Mr. Victor Yossef for his support and guidance throughout this project. Also we would like to thank the laboratory staff, and Mr. Johanan Erez for his attention and help during the whole project.
Also we would like to thank The Ollendorff Minerva Center Fund which supported this project