x265 contains a significant amount assembly optimization for its compute kernels which enables speed-ups of the order of 5X when compared to the running pure C-code. While the support for the x86 architecture is extensive (there exist kernels right from SSSE3 all the way until AVX512), support for other architectures such as ARM is limited. Up until now, x265’s support for the ARM architecture, for example, has been limited to support for ARMv7 architecture (32b).
In the recently released v3.4 of x265, a fresh new set of hand tuned assembly implementation for the 64bit ARM architecture (aarch64) for some compute intensive kernels have been introduced . The figure below shows the acceleration across presets that these kernels enable. On an average, the kernels speed-up the encode by 10%, with up to 21% acceleration for the default medium preset.
Fig.1. Speed up across presets for crowd_run_1080p
These kernels were developed by video codec engineers at Huawei, increasing the footprint of companies that contribute to x265. When asked about their reasons to focus on x265, the Huawei team described their intent to build the opensource ARM ecosystem for use-cases such as Big Data, Web, Storage, Database, Acceleration library, and so on. Given x265’s popularity in the video domain, they chose to contribute to this project, enabling a win-win for both the open-source community and the ARM ecosystem.
In addition, Huawei has made ARM resources available to the open-source community. The community can use the two 8U16G and one 32U64G VMs that have been donated, to work on such aarch64-focused optimizations.