Collaboration with YouTube to optimize video compression in the open source VP9 codec.
In 2016, we introduced AlphaGo, the first artificial intelligence program to beat humans at the ancient game of Go. His successors, AlphaZero and after MuZero, each represented a major step forward in the pursuit of general-purpose algorithms, conquering a larger number of games with even less a priori knowledge. MuZero, for example, mastered Chess, Go, Shogi and Atari without having to be told the rules. But so far these agents have focused on solving games. Now, in pursuit of DeepMind’s mission to solve intelligence, MuZero has taken a first step toward a real job optimizing YouTube video.
In a preprint published on arXivwe detail our collaboration with YouTube to explore MuZero’s potential to improve video compression. Analysts predicted that streaming video will account for the vast majority of Internet traffic in 2021. With video growing during the COVID-19 pandemic and the overall amount of Internet traffic expected to increase in the future, video compression is a and more important problem — and natural application area of Reinforcement Learning (RL) to improve the state-of-the-art in a challenging domain. Since starting production on a portion of YouTube’s live traffic, we’ve demonstrated an average bitrate reduction of 4% across a large, diverse set of videos.
Most online video relies on a program called an encoder to compress or encode the video at its source, transmit it over the Internet to the viewer, and then decompress or decode it for playback. These encoders make multiple decisions for each frame in a video. Decades of craftsmanship have gone into optimizing these codecs, which are responsible for many of the video experiences now possible on the Internet, including video on demand, video calling, video games, and virtual reality. However, because RL is particularly well-suited to sequential decision-making problems such as those of encoders, we investigate how an algorithm learned from RL can help.
Our initial focus is on the VP9 codec (specifically the open source version libvpx), as it is widely used by YouTube and other streaming services. As with other codecs, service providers using VP9 need to think about the bit rate — the number of ones and zeroes needed to send each frame of a video. Bitrate is a major determinant of the computation and bandwidth required to display and store video, affecting everything from the time it takes to load a video to resolution, buffering, and the use of data.
In VP9, the bit rate is optimized more directly via the Quantisation (QP) parameter in the rate control module. For each frame, this parameter specifies the level of compression to be applied. Given the target bitrate, QPs for video frames are decided sequentially to maximize the overall video quality. Intuitively, higher bitrates (lower QP) should be assigned for complex scenes and lower bitrates (higher QP) for static scenes. The QP selection algorithm explains how the QP value of a video frame affects the bitrate distribution of the remaining video frames and the overall video quality. RL is particularly useful for solving such a sequential decision-making problem.
MuZero achieves superhuman performance on a variety of tasks by combining the power of search with its ability to learn a model of the environment and design accordingly. This works particularly well in large, combinatorial action spaces, making it an ideal candidate solution to the problem of rate control in video compression. However, getting MuZero to work in this real-world application requires solving a whole new set of problems. For example, the set of videos uploaded to platforms like YouTube varies in content and quality, and any agent must generalize videos, including completely new videos after development. In comparison, board games tend to have a single familiar environment. Many other metrics and limitations affect end-user experience and bitrate savings, such as Peak Signal-to-Noise Ratio (PSNR) and bitrate limiting.
To address these challenges with MuZero, we create a mechanism called self-competition, which transforms the complex goal of video compression into a simple WIN/LOSS signal by comparing the agent’s current performance with its historical performance. This allows us to convert a rich set of encoder requirements into a simple signal that can be optimized by our dealer.
By learning the dynamics of video coding and determining the best way to allocate bits, the MuZero Rate Controller (MuZero-RC) is able to reduce the bitrate without degrading quality. QP selection is only one of numerous coding decisions in the coding process. While decades of research and engineering have led to efficient algorithms, we envision a single algorithm that can automatically learn to make these coding decisions to achieve optimal rate distortion compensation.
Beyond video compression, this first step in applying MuZero beyond research environments serves as an example of how our RL agents can solve real-world problems. By creating agents equipped with a range of new capabilities to improve products across the board, we can help various computing systems become faster, less intensive, and more automated. Our long-term vision is to develop a single algorithm capable of optimizing thousands of real-world systems in a variety of domains.
Hear Jackson Broshear and David Silver discuss MuZero with Hannah Fry in Episode 5 of DeepMind: The Podcast. Listen now to your favorite podcast app by searching for “DeepMind: The Podcast”.