Mastering the AV1 SVC chains

SVC initialization via native APIs

AV1 SVC Dependency Descriptor

  • Decode targets, which is the subset of Frames needed to decode a coded video sequence at a given spatial and temporal layer.
  • Chains, which are the minimal sequence of Frames that must be received/forwarded so the stream remains decodable without further recovery mechanism required.
  1. The RTP packet belongs to a frame which is required for the currently selected Decode target. This is provided by the Decode Target Indication (DTI) values: Not present, Discardable, Switch and Required.
  2. All frames referenced by the current frame have been forwarded to the endpoint. This is provided by the specifying the frame number (as differential relative value) of the previous frames which this frame reference.
  3. All frames referenced by the current frame are decodable. This is provided by Chain information which is present in all packets, allowing an SFM to detect a broken Chain regardless of loss patterns.
  • The frame is from the spatial layer 2 and temporal layer 2.
  • The Decode Target Indications shows that it is not used for any Decode Target except the latest one, in which it is descartable. That means that the SFU can discard this packet as it will never be referenced by any future Frame.
  • The current frame depends of previous frames with [3,1] difference in frame number, that is, frames #101 and #103. If those frames have not been received/forwarded, the receiver will not be able to decode current one.
  • Which is the previous frame in each of the Chains, also in diff format, so [11,10,9] means frame #93 for Chain 0, #94 for Chain 1 and #95 for Chain 2 (more on this later).
  • Decode Target mapping to spatial and temporal layers. This is not explicitly sent in the Dependency Descriptor, but can be obtained by parsing the Decode Target Indications in each Template Dependency Structure (the actual algorithm is in the spec).
  • Decode Target protected by Chain information, which indicates for each Decode Target which Chain is used for protecting it. That is, if a Decode Target is being forwarded, which chain do we have to track in order to detect that the stream is not decodable anymore and request an I frame to the encoder or use Layer Refresh Request (LRR) when it is implemented.
  • Resolutions for each spatial layer, which is optional anyway.
  • Which Decode Targets are active. This is a very important piece of information, as will allow the SFU to immediately detect when a layer has been stopped on the encoding side and can decide to switch to a different one without having to wait for further packets or timeouts. The SFU will also need to modify this information when doing layer selection so the receiver is able to pass the frame to the decoder as soon as possible without having to wait for inactive Decode Targets.
  • The WebRTC APIs does not expose the Decode Target information, nor it has any API to directly control Decode Targets. While there is a mapping between the Decode Targets and the temporal and spatial layers, it is possible for two different Decode Targets to be associated with the same spatial and temporal layer (although this doesn’t happen on any of the current scalability modes implemented on libwrtc or specified on the AV1 spec)
  • The Decode Target selected for forwarding may be deactivated at any given moment in time, so the algorithm has to be flexible enough to react to it and switch to a different layer, even on K-SVC or “S” modes.
  1. Retrieve the Dependency Descriptor for current RTP packet header extension and the lastest received Template Structure and Active Decode Target mask.
  2. Check if all the referenced frames for the Frame has been forwarded to decide if the frame is decodable or not.
  3. Filter out the Decode Targets that are inactive and order them in reverse spatial and temporal layer order.
  4. Find the first Decode Target which is lower or equal to the desired spatial and temporal layers that we wish to forward and is protected by a Chain that is not broken. That is that all the frames for that Chain have been received and forwarded.
  5. If there is no Decode Target available with a valid Chain, drop the packet and request an I Frame to the sender.
  6. Check the Decode Target Indication to see if the Frame is present in the selected Decode Target and drop the packet if it is “Not present”.
  7. Set the RTP packet mark bit if it is the latest packet of current frame and the forwarded Decode Target spatial and temporal layers are the maximum ones that we are going to forward based on the application settings and the active Decode Target mask.
  8. For the last packet of the Frame, check if all the packets for the Frame has been forwarded, add it to internal forwarded Frames list.

--

--

--

The Fastest Streaming on Earth. Realtime WebRTC CDN built for large-scale video broadcasting on any device with sub-500ms latency.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Tutorial Fuzzy Logic Mamdani for Arduino

Tutorial Fuzzy Logic Mamdani for Arduino

Image by Mark König on Unsplash

PayoutScript-xyZ step-by-step installation instructions

Grok it Yourself

To represent %{SYNTAX:Semantic} for grok

How I cleared Azure fundamentals exam AZ-900?

Feren OS 2022.03 — the newest delayed release

How Process Mining Can Help Manufacturing and Assembly Lines

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Millicast

Millicast

The Fastest Streaming on Earth. Realtime WebRTC CDN built for large-scale video broadcasting on any device with sub-500ms latency.

More from Medium

Convert ncbi-blast+ output format (blast.out) to Fasta format

ODROID-HC4 64 Camera NVR

The history of How Appetizers emerge in the world of cuisines.

FAIL 1: Plaso and Timesketch?