TU Berlin

Fachgebiet NachrichtenübertragungCompression of Images and Video with SMoE Gating Networks

Inhalt des Dokuments

zur Navigation

Compression of Images and Video with SMoE Gating Networks

Thomas Sikora, Rolf Jongebloed, Lieven Lange and Erik Bochinski

Contact: sikora@nue.tu-berlin.de

Short Summary

Our challenge is to efficiently identify and harvest longest-range correlations in images and video - to allow for leaps in compression. Our strategies completely depart from current JPEG and MPEG/ITU type compression approaches with block processing, block transforms and motion vectors. For compression we develop and investigate SMoE Gating Networks.

Click here to see live video results to see how SMoE Gating Network compression can remarkably improve quality of coded video in comparison to the state-of-the-art MPEG HEVC standard in our initial laboratory experiments. The HEVC coded results were generated with approximately 15% more bits compared to the SMoE coded results - yet the HEVC quality is far from the quality of the novel SMoE Gating Network coder. 

The most recent draft standard MPEG-VVC (H.266) is reported to improve app. 30% over HEVC. Comparison between SMoE and VVC at same bit rate shows excellent quality of the SMoE Network coder. This appears to be a very promising result of a novel compression approach, investigated by few researches over the last 4 years. In particular, taking into account that the MPEG-type compression philosophy has been developed over the last 25 years with hundreds if not thousands of researchers involved.

Click on image to see the video.

SMoE Gating Networks – Swarms of Steered Atoms for Compression

In our recent work, we develop specifically designed SMoE Gating Networks for compression. These networks are based on Steered Mixture-of-Experts (SMoE) networks that distribute swarms of N steered “atoms” into arrays of image pixels (for images) or into 3D stacks of video pixels (for video). Simple “atoms” may comprise of steered 2D Gaussian Kernels (for images) or of steered 3D Gaussian Kernels (for video). Kernel parameters include the location of individual Kernels as well as steering and bandwidth parameters. 

A machine learning strategy is employed to identify the location and steering parameters of the N atoms. For compression these atom parameters are coded into bits and stored or transmitted to the decoder. The decoder reconstructs the coded images or video based on the atom parameters.

In such an approach the N atoms compete for explaining directional (spatial or spatio-temporal) correlations in their respective neighborhoods and each atom seeks to reach out its influence as far as possible into the 2D image pixels arrays (for images) or into the 3D stack of pixels (for video).

SMoE Gating Networks for Video Compression

The SMoE Gating Network concept is illustrated for a video compression in Figure 1 below for a video sequence consisting of 64 frames (each frame has 128x128 color pixels). Approximately N=1000 3D atoms are distributed into the 3D video pixel stack (64x128x128 pixels). Their location, bandwidth and steering parameters are optimized in a machine learning approach. 

In Figure 1, selected atoms are shown as “cigars” to illustrate their locations, bandwidth and steering features. Not shown is the 3D pixel stack to ease illustration. t is the temporal coordinate of the pixel stack and it is apparent that each atom explains spatial (in x,y-coordinates) as well as temporal correlations in a unified approach. Apparently atoms capture the motion within the scene in temporal direction.

Figure 1: Swarm of steered atoms of a SMoE Gating Network for a video sequence with 64 frames.

At the decoder, for each atom, a 3D soft-gating function is derived with associated color values.

In essence, the 3D stack of pixels is divided by the SMoE Gating Network into N more or less overlapping 3D soft-gates. In Figure 2 selected 3D soft-gates are depicted using different colors (depicted as non-overlapping hard-gates because the overlapping nature of gates is difficult to illustrate). Apparently many 3D gates have directional properties and appear as “tubes” responsible to explain correlations between hundreds if not thousands of pixels.

At the decoder the 3D color pixel stack is reconstructed using the N 3D soft-gating functions with associated color values. It s apparent that the “tubes” in Figure 2 can be seen as sparse representations of adjacent motion trajectories with similar motion patterns. In Figure 2, two individual motion trajectories are depicted for illustration purposes and show how object pixels in the first frame of a video scene move over time. The associated 3D soft-gates bundle adjacent trajectories with identical or similar motion patterns into “tubes” that are explained by a few atom parameters.

Figure 2: The 3D soft-gating functions appear as “tubes” that explain the long-range correlations in the video sequence.

SMoE Gating Netwoks provide sparse representations of video pixels in images and video sequences. As depicted in Figure 2, individual atoms (3D SMoE kernels) can be responsible for tubes that reach out over many frames (if not over the entire sequence of 64 frames) and over many spatially adjacent pixels thus harvesting longest-range spatio-temporal correlations in video. Individual tubes may explain pixel color values of hundreds if not several thousands of pixels with few kernel parameters.

Figure 3: Comparison of compression performance for “Mobile” video test sequence. Coding results with SMoE Gating Network is compared with latest MPEG HEVC compression standard. Click on image to see live video results.

Figure 3 provides a comparison of visual quality for test sequence “Mobile” coded with our SMoE Gating Network coder and MPEG HEVC coder (Click here to see live video results.). The “Mobile” test sequence contains complex motion (camera motion and complex object motion, such as the rolling ball and the pendulum) which is difficult to code. The SMoE Gating Network is able to identify and code long-range tubes in the video also for the complex object motion patterns (i.e. the ball) - and can thus code and reconstruct the ball with excellent and coherent quality over time. The HEVC coder, on the other hand, quantizes the video sequence frame-by-frame, which results in significantly reduced quality.

It is apparent that, in this coding example, the visual quality of the SMoE approach is much better than the HEVC approach, even though coding with HEVC results in 20% more bits compared to SMoE.

SMoE Gating Networks for Image Compression

For image compression, it is possible to design SMoE Gating Networks with 2D Gaussian kernels as atoms. Figures 4 and 5 provide a comparison of images coded with the recent JPEG 2000 image coder and a SMoE Gating Network coder. It is apparent, that the SMoE coder provides much sharper representation of details compared to JPEG 2000. Especially edges are represented with much better quality.

Figure 4: Comparison of compression performance for test image “Lena” between JPEG 2000 and SMoE Gating Network coder (Proposed) at same bit rate.
Figure 5: Comparison of compression performance for test image “Barbara” between JPEG 2000 and SMoE Gating Network coder (Proposed) at same bit rate.

Figure 6 illustrates the partitions generated by the SMoE Gating Networks. In contrast to the above video compression examples, the soft-gating partitions are now 2D segments (overlapping, but depicted here with “hard” borders). It is apparent that some gates reach out almost over the entire image thus harvesting longest-range correlations in images.

Figure 6: 2D soft-gates derived from the SMoE Gating Networks for the two test images.



Schnellnavigation zur Seite über Nummerneingabe