Meet us at IBC 2018!

Make sure to visit us at our kiosk in the Intel booth at 5.B65, where we will be demonstrating enhancements to x265 that improve the throughput of encoding for ABR streaming by over 2X by leveraging Machine Learning on the CPU. We continue to innovate in this space to bring ground-breaking improvement in performance and quality by leveraging our expertise in video compression algorithms, machine learning technology, and microarchitecture-aware optimizations that enabled us to use Intel AVX512 instructions while encoding.
You can reach out to us on our usual channels developer mailing listdoom9 or Facebook to talk.

Adaptive Multi-Resolution Encoding in x265

By Bhavna Hariharan

x265’s ability to leverage AI to accelerate encoding Adaptive Bitrate Streaming (ABR) has been presented in the paper titled “Adaptive Multi-Resolution Encoding Scheme for ABR Streaming” that is to appear in the proceedings of IEEE International Conference on Image Processing (ICIP), 2018. Adaptive streaming enables dynamic adaptation to changes in network conditions by encoding video content in multiple bitrates and resolutions. Multi-pass x265 encodes exploit the structural redundancy across multiple resolutions by sharing analysis data in the order of increasing resolution, thereby reducing the computational burden significantly.

Static and dynamic refinement features were conceptualized for the Adaptive streaming topology in x265. As the first step, the input video sequence is scaled down to the lowest resolution of the bitrate ladder and is encoded. Coding metadata such as CU quadtree structure, PU predictions, coding modes and motion vectors (MVs) are stored during this first pass encode at CTU level abstraction. The subsequent higher-resolution encodes invoke either the static or the dynamic refinement algorithm. The figure below depicts this architecture for a three pass system. Theoretically, this can be extended to N passes.

The static refinement algorithm may further be classified as intra-picture and inter-picture refinement wherein the information from the previous pass is used at different levels of granularity thereby giving the current encode a head start. The encoder intelligently refines the analysis data based on certain heuristics and later uses it in the Rate Distortion Optimization (RDO) process. By tweaking the degree of information that is being reused, the user can control the performance vs quality balance of the encode.

The dynamic refinement algorithm applies Bayesian classification on the information saved in the previous pass and identifies patterns between the complexity of the content and the refinement level being used. Based on these patterns, the dynamic refinement algorithm switches between the static refinement levels. This will allow the encoder to find the optimal balance between performance and quality while taking the content complexity into account, making it more efficient than the static refinement methods.

Compared to conventional single-resolution approaches, the computational complexity of higher resolution encodes is drastically reduced with the use of the AI-based adaptive multi-resolution technique. The adaptive multi-resolution scheme is able to achieve a whopping speedup of about 2.5X, with a negligible drop in the quality. The above table from the research paper compares the speed vs quality of static and dynamic refinement levels. Since the AI tools used in x265 are not based on CNN models, x265 is able to leverage the advances in CPU technology to achieve these improvements.