H265 / HEVC hardware acceleration in #WebRTC

During the last IETF Hackathon, at the WebRTC table, and then at our offices in Singapore, Intel and Apple came together to add HEVC support in WebRTC.

Intel chips have been supporting HEVC Encoding and Decoding for some time now. They support it in non-GPU hardware, making it a big deal for devices that can’t afford a full discrete GPU.

Most Apple devices integrate Intel chips, so adding support for H265 hardware acceleration in WebRTC is required to enable H265 in all Apple devices at once. Of course, the more support there are for Intel features in the media stacks above, the more hardware they sell. It’s a Win-Win.

Intel’s WebRTC group in Shanghai had an implementation for Desktop, Android and iOS. We acted as a catalyst between the two teams, and the two languages / cultures — Chinese and French :-).

This blog post is about the technical details of H265, and the Hardware Accelerated Codec implementation for libwebrtc in particular. This includes information about GPU acceleration support, in addition to Intel CPU HA acceleration support.

To clarify: CPU Hardware acceleration means the CPU has a dedicated silicon circuit to implement the functionality, as opposed to running a software implementation in the generic x86 CPU. Even though both use the CPU, the dedicated silicon circuit is much faster.

First thing first, if you want to see the commit in webkit. It should appear in the next Safari Tech Preview release, or two. There will not be a specific announcement in the Apple/Webkit/Safari blog, so you will have to look at the SDP yourself.

This is the commit about CPU Hardware Acceleration:

This is the commit about GPU Hardware Acceleration:

Now let’s take a step back. Codecs (AV1, H265) are a little bit like any other big features, e.g. End-to-End-Encryption. While I’m happy to know what Apple, or Zoom, have or do not have, the most important question is: how do I do it myself?

So here it is, an introductory course on default Codecs, Injectable Codecs and hardware acceleration support in native libwebrtc:


Here is a satellite view of the native WebRTC stack.

Leaving the negotiation of the media and codec aside, the flow of media through the WebRTC stack is pretty much linear and represents the normal data flow in any media engine:

The design related to codecs is mainly in the Codec and RTP (segmentation / fragmentation) section. Everything before (frame capture) and after (encryption, ICE, network) is pretty much codec agnostic. The codec and RTP part of the code, for historical reasons, is referred to in libwebrtc as the “Call” API, which is illustrated below:

The Call API. The PeerConnection on top acts as the controller. Images come from the capturer (top left), go down to the encoding, and then RTP Packetization. Upon reception, a packet is decrypted then sent to the depacketizer (bottom right) and once the encoded frame is reconstructed, is sent up to the decoder.

For the sake of simplicity we will only focus on the sending / encoding / packetizing side of things, and only for video. The receiving / decoding / aggregation is completely symmetric and can be inferred from this guide.

A. PeerConnection level

Local codec support needs to be advertised during the handshake, so it is known BEFORE the handshake starts, and cannot vary across time. As usual in C++, Information that does not change during the lifetime of an object are passed through the constructor of that object.

Here, the libwebrtc design provides application the capacity to inject their own Codec support via a Factory on top of libwebrtc. A default, “internal” Video Codec Factory is otherwise provided. This is how, for example, Chrome reuses their own GPU acceleration support in libwebrtc, extending the standalone libwebrtc capacity (see footnote).

The PeerConnection factory can take as a constructor argument a unique External Video Factory. VideoFactory is then passed to a peer connection object when created the same way (through the constructor).

webrtc::CreatePeerConnectionFactory( networkThread, networkThread, signalingThread, audioModule, webrtc::CreateBuiltinAudioEncoderFactory(), webrtc::CreateBuiltinAudioDecoderFactory(), createEncoderFactory(), createDecoderFactory(), Nullptr, Nullptr );

B. VideoEncoderFactory https://cs.chromium.org/chromium/src/third_party/webrtc/api/video_codecs/video_encoder_factory.h

The VideoEncoderFactory takes a VideoCodecInfo structure as input and returns the corresponding VideoEncoder if the VideoCodecType is supported. This can be done multiple times, with different inputs during the lifetime of the factory (i.e. of the peer connection), so it is done through the createEncoder() API:

public VideoEncoder createEncoder(VideoCodecInfo input)

There can be more than one encoder for a given codec type, for example to support software fallback when the hardware encoder fails or does not support all of the possible profiles and options the user requests.

C. VideoEncoder level https://cs.chromium.org/chromium/src/third_party/webrtc/api/video_codecs/video_encoder.h

Each Video Encoder can set a certain number of flags that allows the code to know not only the type of codec they support, but also if they support HW encoding (native frame type => texture on GPU). See Below for the the settings in the case of the H265 HW accelerated codec in the original Intel Implementation for illustration.

EncoderInfo info; // Disable texture support, take I420 or I444 as input info.supports_native_handle = false; info.is_hardware_accelerated = true; info.has_internal_source = false; // external capturer to feed us with frame info.implementation_name = "IntelMediaSDK";// Disable frame-dropper for MSDK. info.has_trusted_rate_controller = true;// Disable SVC / Simulcast for MSDK. info.scaling_settings = VideoEncoder::ScalingSettings::kOff; return info;

The following call sequence diagram represents the full call path from the media capture to the encoder. It assumes the JSEP handshake has been done and the VideoCodecInfo structure has been populated from the info found in the SDP offers and Answers.

The Codec section is in charge of taking in a raw frame (VideoFrame) and generate an encoded frame (EncodedFrame) using a given VideoCodec. The VideoCodec is only passed a raw frame (whose dimensions can change dynamically during the call depending on bandwidth available), and a target bitrate. The details of this part, especially when it comes to Hardware support are provided in section 1.

The RTP section implements the RTP protocol and the specific RTP payload standards that correspond to the supported codecs. It takes an encoded frame as input, and generates several RTP packets. The details of this part is provided in section 2.

Those are then handed down to the encryption layer to generate Secure RTP packets. Encryption, double encryption (SFrame), ICE, and network transport are all out of the scope of this post.

Intermediate classes manage bitrate adaptation. This is quite complicated, and include a lot of heuristics, so we are not going to detail it here either.

Section 1: Adding a codec implementation for H264, VP8 or VP9

Here we do not need to extend internal support for new VideoCodecType, nor for a new RTP payload type. This is the simpler case, and also the case most people are apparently interested in.

A. OWN VideoEncoderFactory

The main way to achieve this is to create a new VideoEncoderFactory, which supports a new type of VideoEncoder, hardware-accelerated. You can then inject it through the peerconnectionFactory constructor. Your VideoEncoderFactory code can reside in the app, in which case you do not have to modify the libwebrtc code.

B. libwebrtc iOS Hardware acceleration support

INTEL has its own Hardware accelerated H264 and H265 MSDKVideoDecoderFactory that does not reside in this copy of libwebrtc, but in its app.

C. libwebrtc Android Hardware Acceleration Support


The iOS hardware acceleration support in libwebrtc is a direct extension of the VideoEncoderFactory design:

D. Examples of some H264 HW implementations

The android hardware acceleration support is pretty simple to read:

sdk/android/src/java/org/webrtc/VideoCodecType.java INTRODUCTION

The global class diagram can appear as complicated, which is a direct result of both the higher fragmentation of Android hardware compared to iOS, and the fact that one level of indirection is needed to mix Java and C++ code, which is not the case when mixing obj-c and C++.. We provide it with annotation for the reader to follow.

H265 example from INTEL [win, mac, ios, android]

INTEL H264 android HW acceleration support

NVIDIA NvEnc, NvPipe, Video Codec SDK implementations


MicroSoft 3D streaming toolkit (old)


You can see in the repositories above that the original implementation (same file name above than in the libwebrtc repository ) was replaced by an NVidia compatible one by microsoft.

MS MixedReality Toolkit (new)

You can see in the new toolkit traces of the same UWP_H264_encoder.

Section 2: Add support for new codecs, like H265 and AV1


Here we have a completely new codec, so we need to extend all the codec structures, add support for corresponding RTP layer.

The changes percolate through the entire pipeline, instead of being relatively limited to the VideoEncoder and VideoEncoderFactory as before.

You can see on the simplified drawing above, which represents adding support for AV1, that not only have the VEF and the VE classes been modified, a dedicated AV1 Encoder has been added, but also the RTP packetizer had to be extended to support the corresponding RTP payload, and the EncodedImageCallback, which links the Encoder with the packetizer, had to be extended.

Video Codec and Factory themself are here:

[win] — https://github.com/open-webrtc-toolkit/owt-client-native/tree/master/talk/owt/sdk/base/win

Protect your code by using a GN variable, and Translate it into a C++ DEFINITION to protect the corresponding code with preprocessor checks. By tradition all the gn variables specific to webrtc are prefixed by “rtc_”

You can then protect it and only enable it on platforms that support it.

Specific Codec Implementation code


Codec RTP Payload code

static const char* kPayloadNameVp8 = “VP8”;
static const char* kPayloadNameVp9 = “VP9”;
IV. BITRATE ALLOCATOR (external) static const char* kPayloadNameH264 = “H264”;ACKNOWLEDGEMENTS
#ifndef DISABLE_H265
static const char* kPayloadNameH265 = “H265”;
static const char* kPayloadNameGeneric = “Generic”;
static const char* kPayloadNameMultiplex = “Multiplex”;

If building for mobile, or a MacOS framework:

Extending the android External Video Factory:

The first thing you want to do then is to extend the list of supported codecs. For this you just add a codec entry in the VideoCodecType enum. A good way to get a feeling about how much changes are needed is to grep the source code for any instance of one of those enum fields.

In api/video/video_codec_type.h

kVideoCodecVP8, kVideoCodecVP9, kVideoCodecH264, #ifndef DISABLE_H265 kVideoCodecH265, #endif kVideoCodecMultiplex,};

In api/video_codecs/video_codec.h

#ifndef DISABLE_H265 struct VideoCodecH265 { bool operator==(const VideoCodecH265& other) const; bool operator!=(const VideoCodecH265& other) const { return !(*this == other); } bool frameDroppingOn; int keyFrameInterval; const uint8_t* vpsData; size_t vpsLen; const uint8_t* spsData; size_t spsLen; const uint8_t* ppsData; size_t ppsLen; }; #endif

Then in the same file, extend the VideoCodecUnion

union VideoCodecUnion { VideoCodecVP8 VP8; VideoCodecVP9 VP9;  VideoCodecH264 H264;  #ifndef DISABLE_H265 VideoCodecH265 H265; #endif };

In the same file, expose a const and non-const constructor proxy to the VideoCodec Class:

class RTC_EXPORT VideoCodec { [...] const VideoCodecVP9& VP9() const;  VideoCodecH264* H264(); const VideoCodecH264& H264() const;  #ifndef DISABLE_H265 VideoCodecH265* H265(); const VideoCodecH265& H265() const; #endif [...]

Payload names need to include a new string:

Management of corresponding FTMP options in SDP:

Main RTP/RTCP codec switch to be extended: modules/rtp_rtcp/source/rtp_format.c

Specific Packetizer to be used from the switch above: modules/rtp_rtcp/source/rtp_format_h265.h

If adding to the builtin video encoder Factory, you can either let the bitrate allocator fall through the default, or (recommended) handle explicitly your codec in the corresponding Case loop:

Acknowledging people who did the work, and/or provided resources which enabled it, is, IOHO, a basic courtesy.

We would like to thank the Google team first for coming up with the original WebRTC implementation and maintaining it.

We would also like to thank the Intel WebRTC group in Shanghai for providing an implementation of H265 hardware implementation and spending time going through the code with us. Special thanks to manager Lei, for making it happen, and the original H265 HW acceleration implementor Qiu for spending time with us.

We would like to thank the Apple team, and especially Youenn Fablet for additional help on the MacOS side of things.

Some of this work has been done within the scope of the free IETF hackathon, especially the IETF 106 in Singapore in November 2019, for which we have Cisco to Thank for. Special thank to Charles Eckel for the Hackathon vision and kind leadership. Thanks to all the usual participants and volunteers including but not limited to Haral T., Lorenzo M., Bernard A, Jonathan L., Sergio M.

Finally, thanks to all the people who are sharing projects out there for other people to learn from. In this case, we took a look at the nicotyze project. We have surely missed many, and can’t site them all anyway, but we would like to thanks everyone nonetheless.

Originally published at http://webrtcbydralex.com on April 3, 2020.

The Fastest Streaming on Earth. Realtime WebRTC CDN built for large-scale video broadcasting on any device with sub-500ms latency.