Video Compression by Cached Vector quantization

Vector quantization (VQ) is used with sophisticate digital signal processingWhere in most cases the input signal already has some form of digital Representation.

Abstract
Vector quantization (VQ) is used with sophisticate digital signal processingWhere in most cases the input signal already has some form of digital Representation and the desired out put is a compressed version of the original Signal. VQ is usually used for the purpose of data compression. There is no other coding technique exists that can do better than VQ. A vector quantizer Q of dimension k and size N is a mapping from a Vector in k-dimensional Euclidean space, , into a finite set C containing N Output or reproduction code point, called code vector or code words. The set C called the codebook and has size N, it meaning it has N Distinct elements, each a vector in .
The resolution, code rate, or simply rate of a vector quantizer is , which measure the number of bits per vector component Used to present the input and gives an indication of the accuracy or precision That is achievable with a vector quantizer if the codebook is well-designed. A vector quantization can be decomposed into two component operations, The vector encoder and the vector decoder.

A vector quantizer as the cascade of an Encoder and decoder

Occasionaly it is conveniet to regard a quantizer as generating both an index, I , and aquantized output value, Q(x). the decode is sometimes referred to as an “inverse quantizer”.

Associated with every N point vector quantizer is a partition of into N regions or cells, and each cells had an index.The encoder gives as the index of the given vector on the hand decoder Generate an output vector from a given index. The task of the encoder is o identify in which N geometrically specified regions of k space the input Vector lies. The decoder is simply a table lookup which is simply and fully determined by specifying the code book.The decoder does not need to know the geometry of the partition to perform his job.

Vector Quantization has been widely used in image-coding systems

Because of its superior rate-distortion (R-D) performance over traditional Scalar quantization schemes and simple decoder structure.

The encoder and the decoder can be separately modeled with a corresponding primary structure. lets A denote the index generator, as the mapping A: and its inverse, the index decoder,: with and b is restricted to have exactly one nonzero component. Then A(b)=I if =1 and =0 for .then the encoder operation can be written as:
and the corresponding decoder operation is given by:

The structural diagram corresponding to the encoder and decoder are

Index I may be regard as the identifier of a channel symbol whose physical nature can have many forms. The index generator and index decoder corresponding respectively to the address encoder and address decoder as used in digital memory circuit.

The codebook
Although there are many different algorithm for creating a codebook, They all perform the same basic tasks. For L codebook entries, the M-dimensional vector space is sectioned Into L non-overlapping cells. In many implementation is the Centroid of the training vectors within cell i. The centroid is the multidimensional mean of those training vectors for a particular cell. The centroid of the cells represent the output code vector associated With the corresponding cells. In other words, during the encoding process When an input vector falls within a particular cell, the index of that Cell will be transmitted as the codeword. For the decoding process The centroid of the cell will be output vector.

The best codebook for a given set of training data is one which will minimize The quantization error over the vector to be quantized. However, even for Moderately sized codebooks, an exhaustive search for the best codebook Results in an impractically large number of computation. So, codebooks are Generated or trained, using statistical clustering algorithms. The LBG algorithm named after Lined, Buzo and Gray, is a widely used Clustering method for generating a codebook. The algorithm begin with an Initial set of codebook vectors, assigns the training vectors to the nearest

Codebook vector, then recomputes the codebook vector as the centroid Of all the vectors assigned to that codebook entry.

The process is repeated until convergence, or no reduction in the overall Quantization error.

Splitting technique(LBG)
Splitting is a technique that resembles the product code initialization. In that it grows large code from small one, but differ in that it does

Not require an integral number of bits per symbol. The method is call The splitting algorithm and it produces increasingly larger codebook Of a fixed dimension. The globally optimal resolution 0 codebook of A training sequence is the centroid of the entire sequence. The one Codeword , say y0, in this codebook can be “split” into two code words, y0 and yo+(eps), where eps is a vector of small Euclidean norm. one choice of eps is to make it proportional to the vector whose ith component is the standard deviation of ith component of the set of training vector. Another choice is to make it proportional to the eigenvector corresponding to the largest eigenvalue of the covariance matrix of the training set. This new codebook has two words and can be no worse than the previous codebook since it contains the previous codebook. The iterative improvement algorithm can be run on this codebook to produce a good resolution 1 code. When complete, all of the code words in the new codebook can be split, forming an initial

guess for a resolution 2 codebook. One continues in this manner, using a good resolution r codebook to form an initial resolution r + 1 codebook by splitting. This algorithm provide a complete design technique from scratch on a training sequence and will later be seen to suggest a vector analog to successive approximation quantizers.

Localized compression of video conferencing
A new localized approach to video teleconferencing is introduced using three dimential vector quantization (VQ). According to the proposed model, recent localized history of the sequences is used to encode present frames based on a hierarchical approach.
The model has several parameters. In terms of history, two parameters, – and Relate to the duration of the history used and to the typical length of each movement, Respectively. The latter, , also determines minimum expected delay of the system. The localization parameters relates to the size of areas considered as likely to contain repeated motions during the time period defined by . As to the transformation process, determines where similar transformed details are searched for. The spatial area of what is considered to be the extent of the typical motion is designated parameters . The hierarchical approintroduces additional parameters, mainly the factor by which the size of is changed between adjacent levels of the pyramidal representation. A factor of 2 is used in the present study.

Blocks and super block. Each block and super block contains 4 consecutive frames (=4)
The figure illustrates a portion of a frame sequence frame (n+1) to frame (n+ ), Which have been divided into super blocks. Each of the super blocks is in turn processed into a set of 1-dimensional vectors (x, y, t).

The goal of the quantizer its to reduce the bite-rate below the entropy We use the approach of vector quantization using sparse codebook. In general each frame divided into frame portion of super block, each Comprising 3 D vectors (x, y, t) in our model we take t = 4. For each super block we build the codebook based on the recent history of the transmitted sequence and the same codebook build in the receiver, Thus we have only one index for each codebook to transmitted via the transmission medium.

The encoding process:

1. The dedicated localized codebook having the base vector 8x8x4 from the base history-based VQ is searched. The block having an SNR greater then threshold T1 are output. For a source image of a person’s head and shoulder, more than 90% of the vectors can be successfully coded at 1/40 bpp;

2. For the vector having an SNR less than T1, the search expand to include codebook of adjacent super block. From this search, vector having an SNR greater than T2 are output . most of these vector are satisfactory encoded at about 1/10 bpp;

3. The vector unsatisfactory encoded in 1,2 are further coded using more refined codebook of 4x4x4. there after a search is made for vector having an SNR greater than t3.

4. Remaining are encoded using 2x2x4 codebook and our output if the encoded vector have an SNR greater than T4.

5.Those vectors inadequately coded using code book in the above steps are coded by scalar quantization.

In our decoder we use the three dimensional vector-quantization (x, y, t). We encoded the present frames based on the recent localized history of the Sequence.

The model
We developed a model that specific to teleconferencing. In teleconferencing there are a specific properties that we Abuse to the qvantizer we build.

the properties are:

Sequence history: Typical motion of recent history are likely to repeatedly appear in the next frame. This is one of the major distinction between teleconferencing and other video sequence
Localization: Such movements are primarily likely to re-appear in the same area the image were appear before
Transformation: If not similarly repeated (up to allowed distortion, usually 25-40 db), details can be encoded as transform version of previously appeared
Hierarchy: If specific motion can not be matched accurately according to coarse partitioning of the frame, a more refined description is used

The encoding process
1. The dedicated localized codebook having the base vector 8x8x4 from the base history-based VQ is searched.

The block that having SNR greater then threshold T1 are output.

2. For the vector having an SNR less than T1, the search expand to include codebook of adjacent super block. From this search, vector having an SNR greater than T2 are output .

3. The vector unsatisfactory encoded in 1,2 are further coded using more refined codebook of 4x4x4. there after a search is made for vector having an SNR greater than t3.

4. Remaining are encoded using 2x2x4 codebook and our output if the encoded vector have an SNR greater than T4.

5.Those vectors inadequately coded using code book in the above steps are coded by scalar quantization.

Figure of flow chart of the hierarchical encoding process

Referring to this figure B1 to B5 represent the encoded codes output To the receiver at the respective stage and b1 to b4 represent The control bits for signifying whether the B1 to B5 codes of The next stage are to be used.

Experimental results and analysis
The coding scheme was implemented and tested on the sequence of Miss America, typical of teleconferencing.

This graphs represent the basic LBG algorithm performance.

Low quality means PSNR~32

This graphs represent the basic LBG algorithm performance.

High quality means PSNR~36

Cached VQ codebook
The code discussed above book do not vary with time. Therefore, it is extremely important to train this code book for optimal performance with varying time, and hence varying input vector, characteristics.

This algorithm is an adaptive in time propagation.

For each new vector, we search for the nearest existing vector in the codebook, and if they are close enough then we calculate their average regarding codebook vector’s weight and store it instead of original vector in the codebook. If we didn’t find codebook vector close enough to the new vector and the codebook is not full, we add it to the codebook. If the codebook is full, we remove LRU (Last Recently Used) vector.

This algorithm is intended to replace LBG algorithm to improve compression performance.

Experimental results and analysis
The coding scheme was implemented and tested on the sequence of Miss America, typical of teleconferencing.

This graphs represent the basic Cached VQ algorithm performance.

Low quality means PSNR~32

Compression with huffman code
Huffman code:
Source – coding algorithm

The entropy of a source, gives a sharp bound on the rate at which

A source can be compressed for a reliable reconstruction.

This means that at rates above entropy it is possible to design a code With an error probability as small as desired, where as at rates Below entropy such a code does not exist. This important result, However, does not provide specific algorithm to design code approaching This bound.

The Huffman source – coding algorithm
The idea is to map more-frequently occurring fixed-length sequences To shorter binary sequence and the less-frequently occurring ones To longer binary sequence. In variable-length coding, synchronization Is a problem. This means that there should be one and only one way To break the binary-received sequence in to code words.

Huffman encoding algorithm

Sort source output in decreasing order of their probabilities
Merge the two least-probable outputs into a single output Whose probability is the sum of the corresponding probability
If the number of remaining outputs is 2, then go to the next Step; otherwise go to step 1
Arbitrarily assign 0 and 1 as codewords for the two remaining
If an out put is the result of the merger of two outputs in a preceding Step, append the current codeword with a 0 and a 1 to obtain the Codeword for the preceding outputs and repeat step 5. If no output is preceded by another output in a preceding step, then stop

Huffman code achieved when in our codebook we calculate the entropy, which came from this encoding.

Probability of the ith codeword is Codeword.

wi is the weight of each codeword in the codebook.

N is the number of the codeword in the codebook.

Experimental results and analysis

This graphs represent the Cached VQ + Huffman encoding algorithm performance.

Low quality means PSNR~32

High quality means PSNR~36

Energy
An efficient codebook post processing technique and a window based Fast search Algorithm for vector quantization

The post processing is to reorder the codebook according to the Codeword’ potentials. We want the vector quantization to generate more dependent indices without sacrificing the performance in fidelity. The inter index correlation iresulted from two properties of a vector Quantizer: 1) correlation preservation property and 2) correlation generation property. In other words the inter index correlation is 1) inherited from the inter Vector correlation in the spatial domain and 2) generated by the vector quantizer. For large codebook size, the first property dominates while for The small codebook size, the second one becomes dominant.

In practice the codebook is large, so the influence of the second source On the inter index is negligible. Here we set our objective at designing a vector quantizer with a high correlation preservation capability. a natural way to achieve this objective is to order the code words in such a manner that ‘similar’ code words should have close indices. And the more similar, the closer. Moreover, an effective measurement of the similarity should be utilized to order the code words. We adopt the potential of a vector as a measure of similarity. The potential Of a vector is define as the potential here is identical to the energy of the vector. From the definition, we can easily obtain the following equality .
We used the WBFS, which first searches the window built around the predicated index, algorithm to accelerate the code book searching procedure.

Vector A, B and C denote the left, left-above neighboring vectors of current vectors X, respectively. The horizontal potential gradient and vertical potential gradient are defined as follows:

Where , and are potentials of vector A, B, C, respectively. Then the prediction is made according to if Where and are indices of vector A and C, respectively and is the predicted index.

Experimental results and analysis

This graphs represent the Cached VQ + Huffman + Energy encoding algorithm performance.

Low quality means PSNR~32

High quality means PSNR~36

Comparison and conclusions

Results & conclusions
We start with LBG algorithm and we saw C.R.=52 for PSNR=32 , and C.R.=13.7 for PSNR=35.6

For cached VQ we saw C.R=144 for PSNR=32, and C.R.=28.2 for PSNR=36.

We add huffman encoding for the cached VQ algorithm and we get C.R=203 for PSNR=32, and C.R.=41 for PSNR=36.

We add Energy encoding to previous algorithm and we get C.R=215 for PSNR=32, and C.R.=51 for PSNR=36.

For all experiments codebook size limit was 512 codewords.

For codebooks update we used 4 last 4-frames for both LBG and Cached VQ algorithms.

We saw that the energy algorithm is useful for scalar encoding only and is harmful for encoding of bigger vectors. This is caused by using hierarchical codebooks, and energy algorithm is good for encoding by one codebook.

The improvement from Cached VQ over LBG increases with lower PSNR.

Energy (on scalar encoding) have better effect when encoding for better quality since on higher quality more vectors are encoded by scalar encoding.

Improvement in encoding quality can be achieved by optimizing encoder parameters such as codebook size, thresholds for using vector of specific size and other codebook building parameters.

Acknowledgments
We would like to thank our supervisor Dmitry Furman for his support and guidance throughout this project.
Also we would like to thank the Ollendorff Minerva Center Fund which supported this project.