In this project, we introduce a new medium, called "Video Texture" .The video texture incorporates the qualities of images and video.
In this project, we introduce a new medium, called “Video Texture” .The video texture incorporates the qualities of images and video. On one hand, a still photo has the time invariant information, but it lacks the dynamic representation of an object – a photo of a dynamical scene can only hint on the movement involved. On the other hand, a video can describe motion – it has the dynamical characteristics, but video is time limited (because of disk space considerations), and always has the same beginning middle & end. If we try to stitch its end to its start, in order to achieve a longer time view of the subject, we’ll get self repeating clip, with unpleasant jumps at the stitch points. The Video Texture has the timeless quality of the photograph but also has the motion characteristics of a video. As much as a photo has spatial texture, a video texture has a texture in time. In our project, we take a sample video clip as an input, analyze it, and create a video texture from it, which is time invariant, and can be as long as you like. The idea is to take a sample clip, analyze it, get all the necessary parameters, and to synthesize a new clip that can have any duration, and still looks natural, as the original input clip.
Video Textures are a medium somewhere between the static images and streaming video. One way to describe it, is to say that “such an algorithm should be able to take a sample of texture and generate an unlimited amount of image data, which, while not exactly like the original, will be perceived by humans to be the same texture.” This too, is the goal of video textures! However, instead of generating unlimited amount of image data by extrapolating in the spatial domain, video textures generate unlimited amount of video data by extrapolating in the temporal domain. To do this requires three steps: analysis, synthesis and rendering. In the analysis stage, all the frames in the original image are examined and some features are extracted. The synthesis stage takes the results from analysis and constructs a new frame sequence in which certain sections of the video may repeat, but the same sequence will not likely to be played twice. Rendering removes any aliasing that may result from the first two stages in order to display the new video as nicely as possible.
The goal of the analysis stage is to produce a matrix of probabilities where each entry in the matrix, Pi,j, is the probability of transitioning
from frame i to frame j. The first step in creating the probability matrix is to create a distance matrix between each frame
which denotes the L2 distance between each pair of each frames Ii and Ij. When synthesizing the video texture, the best transitions are those where the distance between original successor of Ii is similar to Ij. That is, when Di+1,j is small we want the Probability Pi,j to be high. Any monotonically decreasing function would work, we used the one presented in the paper.
Examples of the D-matrix (on the right) and the P-matrix (on the left) are shown in Figure 1. Pay attention to the white diagonal going through the P-matrix (The Main Diagonal). It represents the natural stream of the input movie, and actually compares frame i+1 with itself, therefore the distance between them is 0, which means a transition probability of 1.
Image distance is not the only constraint we want to impose on the frame sequence; we also want to preserve the dynamics of the system. Consider the motion of a pendulum. The frame i-1 on the upswing is very similar to the frame i+1 on the downswing (Figure 2.) However, if such frames are substituted often in the video texture, the result will be sudden and unacceptable (to the viewer) changes in direction. We need to be able to enforce some coherency in direction of movement This can be achieved in a very simple way by factoring the neighboring frames into the cost of a transition. By doing this we are able to make transitions that are similar in both space and velocity. This subsequent matching can be done by convoluting the cost matrix with a weighted or uniformed diagonal matrix. In order to do so, we apply the current formula:
Examples: [Pendulum Clip – No preserving dynamics was used]
[Pendulum Clip – 5×5 filter was used to preserve dynamics]
Avoiding Dead Ends and Anticipating the Future
One disadvantage of the method described so far is that we will choose to jump to a frame that doesn’t have any good transitions from it, which mostly happens towards the end of the original input movie. For example, the hand that appears in the last frames of the Clock Pendulum clip. This, of course is something we want to avoid. By propagating the cost of each frames transitions backwards in time, we are able to learn which transitions lead to dead ends and prevent them.
P”i,j is computed using D”i,j instead of D’.
During our work we came to the conclusion that current algorithm does not yield satisfying results. Although it does reduce the value of the problematic frames on the main diagonal their value is still too high comparing to the others, therefore if probability value of the frame on the main diagonal drops below some predetermine value (0.96 for example) we set it to zero & preventing from other frames to go to the “dead end” frames.
There are two basic ways to sequence the frames into a video texture: random transitions and video loops.
Random play is the most simple and easy to implement way of making a video texture. The video texture can generally start in any frame from which there is at least one probability larger than 0. The next frame, then, selected by walking along all the other frames and choosing the one that probability to jump to is more than some predetermine value. By repeating this step for a number of times equal to the desired video texture size we then obtain the final video texture movie. Doing this process in real-time virtually infinite video texture can be produced. This technique however, produces quite poor results, with many visible jumps and discontinuations, as we can see in this Flashlight Example.
Video Loops is a more complex synthesis methods which also produces better results. The basic idea is to find patterns in the original movie with very similar start and end. We may be able, then, to jump from the end of this pattern back to his start without visible disturbance. This will be called the primitive loop. Finding a number of those loops and using them to jump from one to another will result a video texture that will be much more smooth and natural-looking to human eye. The example can be seen on Figure 3.
In order to implement Video Loops we, first of all find for every frame the one that probability to jump to from the current one is the largest. The results are put in LUT, along with the distance between the two frames (We don’t want very close jumps). The results are then refined to final Average value (Figure 4.). The wanted number of frames with the best average values than taken, and there are our primitive loops.
By walking across the movie from the first frame we then, when reaching a frame stored in LUT, randomly decide if we want to take the loop or to continue to the next frame. The last loop is always taken. Again, by passing given number of frames (of course, if we visit the same frame n times it will be counted as n frames) the video loop of desired length is created. Video Loops provide much better results than random transitions, what can be seen by comparing Random Transition Synthesized Pendulum Clip with Video Loops Synthesized Pendulum Clip.
Even after finding transitions that introduce only small discontinuities in the motion, there are cases where there are noticeable transitions in the video texture. There are several techniques for eliminating discontinuities in the video texture, and for blending independently analyzed regions together.
In our work we tested some algorithms and decided to implement one of them – Crossfade.
Instead of simply jumping from one frame to another when a transition is made, the images of the sequence before and after the transition can be blended together with standard cross-fading: frames from the sequence near the source of the transition are linearly faded out while frames from the sequence near the destination are faded in. The fade is positioned so that it is halfway complete where the transition was scheduled.
You can see the schematic representation of the algorithm on Figure 5.
In order to enable users to create Video textures easily , we provide a GUI (Graphical User Interface) environment. In our Matlab-based GUI the user can produce a Video Texture in relatively short time, by providing required parameters in simple and understandable way. The complete guide on how to use the GUI is provided in the project report.
- The project was implemented using Matlab program
- The Video files preprocessing was made by Virtual Dub application
In this project we’ve become familiar with a new type if media – the Video Textures. During the work we studied, implemented and improved basic Video Texture algorithms, and we also have provided a friendly & convenient way to implement the developed techniques, by building a GUI. We’ve characterized the effects of the various parameters (such as sigma, alpha, p) and we came to the conclusion that although there are not absolute restricting definitions for them, there are certain narrow range of values that give an optimal results in most cases. We’ve also came to conclusion that certain new parameters had to be defined in order to achieve the best implementation of the algorithms, but they’re not supposed to be determined by the user – since they determined by the input movie or already set to optimal value.
After the basic implementation, we’ve applied the Crossfade technique in order to get smoother results and accurate transitions from frame to frame.
Eventually, we thought about the idea of combining two similar but yet different clips into one texture that will present characteristics from the both of the clips. In the Result Texture, the blue stripe represents frames from the First Clip and red stripe represents frames from the Second Clip.
There are much more exciting and interesting cases where Video Textures can be used: Internet applications, desktop backgrounds, commercials, computer games – are only some suggestions. Combining with some more advanced techniques, it`s possible to achieve even more outstanding effects- such as video based animation & video textures with sound.
We would like to thank our project supervisor Hilit Unger for her help, guidance and support throughout the whole project.
We are also grateful to the VISL laboratory staff, and personally to Johanan and Inna for their help.