Get The Picture With Video Acceleration In The Datacenter

With all of us now learning to live, work, and learn from home, it is becoming apparent how critical video streaming is as a tool to support our new normal. Specifically, the use of live video streaming platforms is helping us remain connected to loved ones, ensure our children maintain continuity on their educational trajectories, and we get face-to-face interaction with our physician as needed. Live streaming as a technology will continue to play a significant role in our lives.

There is meteoric growth in this area. It is expected that by 2026 the live video streaming market will approach $94 billion. Also, live streaming platforms like Twitch and YouTube Gaming command significantly larger audiences compared to non-live video streaming platforms like Netflix and HBO.

Cisco’s Visual Networking Index indicates video alone will represent 82 percent of all network traffic by 2022. The “live” component of that traffic is no doubt growing and becoming an increasingly large component of that traffic. Live video transcoding and processing is a very computationally intense problem one that Xilinx has helped its customers address with greater compute efficiency for a reduced TCO.

The largest issue facing live video streaming providers is the egress bandwidth costs they must pay to support their networks. If for example, bandwidth costs could be reduced by 30 percent, this would have a material impact on operating expenses. In the case of a streaming platform that has approximately a hundred-thousand concurrent streamers the nominal bandwidth costs would be about $73 million per year. In this hypothetical case, if the platform operator could reduce this cost by 30 percent while preserving video quality that could represent an annualized savings of $22 million per year.

Beyond just hypotheticals, this issue is true an acute condition which is impacting live streaming platforms today. Looking at Huya.com for example, China’s #1 game streaming platform. Based on its recently posted financial filings with the US Securities and Exchange Commission, Huya.com saw bandwidth costs rise by 40 percent in the fourth calendar quarter of 2019 due to an increase in subscribers and the firm’s attempt to improve user experience by improving video quality on the platform. The rising bandwidth and operating costs are very real and are challenging the business models of these providers.

The following figure is a great representation of the dynamic that is facing most live video streaming platform operators. As with most things in life, the Pareto principle applies to the traffic patterns in the network. In this case, about 20 percent of the broadcasters command about 80 percent of the traffic and, therefore, are responsible for driving a majority of the bandwidth costs. This head end of the network is certainly a key area for the operator to focus, but there is the other end of the network, the tail end. Here the concern of the operator is on minimizing the cost to implement all the infrastructure to support all the other broadcasters that command much smaller audiences. Here the objective is less about managing operating expenses, and instead focused on better management of capital expenditures and, therefore, achieving the lowest possible cost per channel in with the highest density. Balancing these two requirements in the network continue to challenge operators.

To address the challenges facing live streaming platform operators, Xilinx recently announced a series of purpose-built video transcoding appliances based on the new Xilinx Real-Time (RT) Server reference architectures built around its Alveo family of data center accelerator cards. The initial Xilinx RT Server reference architecture supports two video transcoding editions: one focused on providing the best video quality at the lowest bitrate; and another targeting very high-density deployments at the lowest possible cost per channel. Server appliances based on these reference architectures are being implemented by Xilinx resellers and OEMs and will come pre-configured with up to eight (8) Alveo U50 cards or the new Alveo U30 accelerators. The U30 accelerator was recently launch and is a dedicated media accelerator that this capable of supporting up to two 4KP30 channels. This resolution can further be subdivided to support up to 16 1080p30 channels per card. When integrated into a 1RU form factor RT Server, the appliance is capable of supporting 128 channels of 1080p video, which is an extremely dense implementation supporting these long tail of broadcasters that have very few followers. The Alveo U30 is powered by the Xilinx Zynq UltraScale+ MPSoC all programmable SoC and is based on the EV variant of the family – which includes a hardened video codec unit (VCU). Since the Alveo U30 is largely making use of the VCU much of the FPGA fabric is available to support additional capability. Xilinx is working to port is deep learning processing unit or DPU to support additional use cases in the video analytics domain. The Alveo U50 implementation of the HEVC codec is based off the well-regarded NGCodec HEVC algorithm that was developed by NGCodec. Xilinx acquired NGCodec about a year ago and this is the first commercialization of the HEVC codec since the acquisition.

Xilinx is positioning these appliances to address several existing and emerging use cases in the live video streaming market. One of the primary applications of focus is eSports, platforms like Twitch have become extremely popular, not just for game content, but as a live streaming platform. There are many platforms in China and other markets outside of the United States that are providing similar services. Enterprise collaboration is another important area of focus and emerging use cases amidst COVID-19 like telemedicine, distance learning and social video collaboration sites are demanding robust and scalable video streaming platforms.

The RT Server appliances are based on a Xilinx software stack that has been validated. The system is heterogenous based on up to eight (8) Alveo U50 or Alveo U30 accelerators. Alongside the Alveo hardware is a X86 processor. Though video transcode is offloaded to the FPGA or VCU a performant CPU like the AMD “Rome” Epyc can solve other complex workloads like audio processing, ad insertion or can host GUI applications like Wowza Streaming Engine server that can be used to manage transcoding channels. Above the hardware layer Xilinx has developed accelerator binaries for the specific algorithmic functions that run on the FPGA. The Xilinx run-time (or XRT) run on top of these binaries and expose the hardware kernels to the software layer. The Xilinx media acceleration (or XMA) API exposes the hardware accelerated decoding, video processing and encoding functions required to enable a transcoding microservice. The user addresses the Xilinx APIs via the FFmpeg command line. FFmpeg is widely used in the video transcoding domain. Most users will be quite familiar and happy to work with this CLI. For non-CLI uses Xilinx is working with its resellers and Wowza to integrate the Wowza Streaming Engine media server into the RT Server appliances that are offered through Xilinx resellers. Wowza integration is expected to be complete is the summer quarter of 2020.

The bitrate optimized version of the RT Server reference architecture is quite competitive relative to traditional software-based implementations. The diagram below shows that a single HP ProLiant DL385 Gen 10 Plus server, configured with eight (8) Alveo U50s (available to configure at hpe.com), is the equivalent of five HPE ProLiant DL380 Gen 10 servers with ten Intel 8275CL 3.0GHz CPUs. The Xilinx reference architecture has 5X the throughput per node, 6X lower hardware cost, and 3X the power of the alternative implementation. To see a detailed tutorial demonstrating how a powerful transcoding appliance can be built from COTS equipment available at HPE refer to the video here.

The high-density version of the RT Server is built from eight (8) Alveo U30s and is integrated into a 1RU appliance, which is currently available from Xilinx resellers. In order to process the equivalent of 1080p480 HEVC at the equivalent quality of NVENC “medium” would require four HPE ProLiant DL380 Gen 10 servers equipped with 32 Nvidia Tesla T4 accelerators. The Xilinx appliance has 4X the throughput of the equivalent Nvidia implementation and has 6X lower hardware cost and only 20 percent of the power consumption of the alternative implementation.

The Xilinx RT Server video appliances are currently available for evaluation or can be purchased via hpe.com for the bitrate optimized Alveo U50 version or through authorized Xilinx value-added resellers. You can go to https://www.xilinx.com/rtserver to learn more about the RT Servers or to request an evaluation.

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.

Subscribe now

Be the first to comment

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.