Millicast has been using #WebRTC for large scale streaming for many years now. There is a request that comes back very often: “Can I add latency and keep the quality of the media as high as possible?”. To answer that question, one need to look at the concept of reliability, and how it varies depending on the protocol of choice.
Word of warning: this is an oversimplification of the technology with only some of the core concepts presented. It serves as an introduction to specific concepts for people from outside that field of expertise, and absolutely not as a complete, thorough, and organised presentation on the. That kind of reference will be provided toward the end of the post for those interested to know more.
Story of partial reliability and RTP.
At the base of internet one had two Network Transport protocols: TCP and UDP.
TCP is a reliable network transport, it would never lost a packet, and if lost in transit would keep retrying to send it until it eventually succeed. unfortunately when that packet would be needed for some operation on the receiving-side, e.g. reconstruction of an encoded video frame that would have been transported in several packet, that operation would delayed until the packet arrives. Retransmissions also leads to bandwidth usage overhead. Given the best effort nature of the public internet, reliability was seen as a must have in most case, whatever the impact on latency, or throughout.
UDP is an unreliable network transport, fire-and-forget kind of transport. Because he does not try to do anything special, its performance on good quality network links is great. unfortunately when a packet that would be needed for some operation on the receiving-side, e.g. reconstruction of an encoded video frame that would have been transported in several packet, is lost, that operation would fail. Since there is no mechanism for checking if delivery happened, the only way to know that a packet did not arrive is to time out, which in turn leads to delay in processing.
Real Time media had a dilemma: TCP was too slow and blocking, UDP was leading to frame drops and delays. … Scylla and Charybdis.
Digging a little bit deeper, it is apparent that for real-time media, full reliability is an overkill. In real-time media, where real-time is an hard constraint (not like “live” media which could accommodate a few seconds delay), it s all about windows of opportunity. Let’s phrase that as an imaginary discussion:
: I did not receive a packet, what should I do?
sender: Well, it depends. How long you have before you really need it.
receiver: It is a 30fps media stream, so I have around 30ms to get the packet, reconstruct the encoded frame and push it to the decoder before next frame packets comes knocking. Is it enough?
sender: Well, it depends. How long would it take to send the packet back?
receiver: If I send you a message to get the packet sent again (NACK) and you send it back to me, it would take a single round-trip to get it back (hoping it does not get lost again).
sender: OK, let’s monitor the Round Trip Time between us (RTT), and decide on the fly whether it makes sense to send the packet back or to just drop the frame since we won’t do anything with it if it s too late.
That scheme consisting of a feedback signal, some real-time measurement, and dynamic decision on whether to be reliable or not is called Partial Reliability. It does not fit either TCP or UDP per design, and both network protocols should not be extended to do this. Why should you ask? Because a network transport protocol should not know what it is transporting, while the Partially reliable protocol we have defined above assumes that media frames are being transported. To make that distinction between protocols, TCP and UDP are called Network transport protocols, and our protocol will be part of a family of protocols called Media Transport Protocol. The media transport protocol we described succinctly above is the based of the Real-Time Transport Protocol (RTP).
The media transport protocols need an underlying network transport protocol. Since RTP implements it s own specific reliability protocol, it is better used with a network transport protocol that does not have one, otherwise they would compete. Originally the only candidate for the job was UDP. Adding the speed and simplicity of UDP with the smarter-than-TCP-for-media RTP protocol gives you the protocol that has been ruling real-time media transport since 1996 in one flavour or another (RTSP, WEBRTC, RIST, …).
That is a blog post for another time, but RTP was the beginning of the integration of several OSI layers into the concept of a Media Engine, by adding feedback process between bottom and top layer. NACK was one of the first such feedback mechanism, between network and media transports, and further mechanisms where added to better integrate encoders and even source feed. It is not a surprise if later “protocols” which “inspired” themselves from RTP like SRT work better with “Network-Aware” encoders from the same vendors.
What are the alternatives when Real-Time is not a hard constraint
If you can accommodate some delay, and quality is your hard constraint, then reliability is your friend. You would first choose a reliable transport, if possible one that scales well and pass through all firewall and cache. Candidates include HTTP 1.0, 1.1 and variations like HTTPS and WebSockets.
If there is not enough bandwidth to accommodate your stream, and you do not want to lower the quality, then you have a hard problem. You could use adaptability, but then, you original hard constraint of not lowering the quality would not apply right. Why compromising on the latency if you can t get high quality media anyway. Adaptability only makes sense if at least some of your viewers can handle the full resolution sent by the source. That depends on what you source upload bandwidth can handle.
If you suffer from packet loss, you would get choppy or stuttering video playback. For smooth playback, you would then buy yourself some time for the missing packets and video to be retransmitted adding a buffer on the receiving side, and adding as much latency in the process.
Ta-dam, adaptability and security aside, you have just defined the base for HLS and web-socket based Media Transports.
That was interesting, where can I learn more
Obviously this is an oversimplification of what is happening. I hope it will still prove itself hopefully introducing the concepts.
The usual answer to this question is: master degree in Media System. If you are in Singapore, NUS’ Computer Science course CS5248 is spot on, for those that are not, this cours material are public, online.
If a master course is not for you, a good book is always a good option. Thanks god, RTP is now textbook technology. At CoSMo those are the mandatory readings for fresh graduates that are not from this field:
“RTP, Audio And Video For the Internet” by Colin Perkins. The entire book is a god sent, and pointers to the IETF specifications are provided for the most hardcore learners. The vision we carry at CoSMo is from 2003 and it does not contain the latest advances though.
“SIP: Understanding the Session Initiation Protocol” by Alan Johnston, has many interesting chapters that complement colin P’s book. The fourth edition we carry was also printed in 2016 and include materials that was not out when the previous book was written. Chapters 10, 12, 13, 14, 16 and 19 are must read for our tech employees.
“Networked AV System” is an interesting book to read to have a foundation understanding of media systems over IP before webrtc. It helps understand the broadcast industry point of view, and provide some practical info on setting up and debuting such systems.
“WebRTC: APIs and RTCWEB protocols” by Alan B. Johnston and Daniel C Burnett is the Bible. Translated in many languages including japans and chinese, it is THE book to read last to put to decades of RTC into webrtc scope. While the API part might now be outdated, the protocol part is still very acurate. With all the rtcweb specification that were blocked in what we called the cluster 238 having been released today, and webrtc 1.0 become standard toward the end of the month of january 2021, I hope there will be a final update to the book. Even without, a must have.
Today, with the above 3 ~ 4 text books you already have yourselves more than covered to dig inside webrtc implementations and understand more or less what’s going on.
Eventually, what was crazy idea yesterday will become a published paper or standard today, and textbook tomorrow. Those books covers the well established base, and are already quite a lot to absorb.
For the latest tech, people can follow the standard committees (or this blog), and read scientific literature.