Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 5. RTP Control Protocol

Components of RTCP
Transport of RTCP Packets
RTCP Packet Formats
Security and Privacy
Packet Validation
Participant Database
Timing Rules

There are two parts to RTP: the data transfer protocol, which was described in Chapter 4, and an associated control protocol, which is described in this chapter. The control protocol, RTCP, provides for periodic reporting of reception quality, participant identification and other source description information, notification on changes in session membership, and the information needed to synchronize media streams.

This chapter describes the uses of RTCP, the format of RTCP packets, and the timing rules used to scale RTCP over the full range of session sizes. It also discusses the issues in building a participant database, using the information contained in RTCP packets.

Components of RTCP

An RTCP implementation has three parts: the packet formats, the timing rules, and the participant database.

There are several types of RTCP packets. The five standard packet types are described in the section titled RTCP Packet Formats later in this chapter, along with the rules by which they must be aggregated into compound packets for transmission. Algorithms by which implementations can check RTCP packets for correctness are described in the section titled Packet Validation.

The compound packets are sent periodically, according to the rules described in the section titled Timing Rules later in this chapter. The interval between packets is known as the reporting interval. All RTCP activity happens in multiples of the reporting interval. In addition to being the time between packets, it is the time over which reception quality statistics are calculated, and the time between updates of source description and lip synchronization information. The interval varies according to the media format in use and the size of the session; typically it is on the order of 5 seconds for small sessions, but it can increase to several minutes for very large groups. Senders are given special consideration in the calculation of the reporting interval, so their source description and lip synchronization information is sent frequently; receivers report less often.

Each implementation is expected to maintain a participant database, based on the information collected from the RTCP packets it receives. This database is used to fill out the reception report packets that have to be sent periodically, but also for lip synchronization between received audio and video streams and to maintain source description information. The privacy concerns inherent in the participant database are mentioned in the section titled Security and Privacy later in this chapter. The Participant Database section, also in this chapter, describes the maintenance of the participant database.

Transport of RTCP Packets

Each RTP session is identified by a network address and a pair of ports: one for RTP data and one for RTCP data. The RTP data port should be even, and the RTCP port should be one above the RTP port. For example, if media data is being sent on UDP port 5004, the control channel will be sent to the same address on UDP port 5005.

All participants in a session should send compound RTCP packets and, in turn, will receive the compound RTCP packets sent by all other participants. Note that feedback is sent to all participants in a multiparty session: either unicast to a translator, which then redistributes the data, or directly via multicast. The peer-to-peer nature of RTCP gives each participant in a session knowledge of all other participants: their presence, reception quality, and—optionally—personal details such as name, e-mail address, location, and phone number.

RTCP Packet Formats

Five types of RTCP packets are defined in the RTP specification: receiver report (RR), sender report (SR), source description (SDES), membership management (BYE), and application-defined (APP). They all follow a common structure—illustrated in Figure 5.1—although the format-specific information changes depending on the type of packet.

Figure 5.1. The Basic RTCP Packet Format

The header that all five packet types have in common is four octets in length, comprising five fields:

Version number (V). The version number is always 2 for the current version of RTP. There are no plans to introduce new versions, and previous versions are not in widespread use.
Padding (P). The padding bit indicates that the packet has been padded out beyond its natural size. If this bit is set, one or more octets of padding have been added to the end of this packet, and the last octet contains a count of the number of padding octets added. Its use is much the same as the padding bit in RTP data packets, which was discussed in Chapter 4, RTP Data Transfer Protocol, in the section titled Padding. Incorrect use of the padding bit has been a common problem with RTCP implementations; the correct usage is described in the sections titled Packing Issues and Packet Validation later in this chapter.
Item count (IC). Some packet types contain a list of items, perhaps in addition to some fixed, type-specific information. The item count field is used by these packet types to indicate the number of items included in the packet (the field has different names in different packet types depending on its use). Up to 31 items may be included in each RTCP packet, limited also by the maximum transmission unit of the network. If more than 31 items are needed, the application must generate multiple RTCP packets. An item count of zero indicates that the list of items is empty (this does not necessarily mean that the packet is empty). Packet types that don't need an item count may use this field for other purposes.
Packet type (PT). The packet type identifies the type of information carried in the packet. Five standard packet types are defined in the RTP specification; other types may be defined in the future (for example, to report additional statistics or to convey other source-specific information).
Length. The length field denotes the length of the packet contents following the common header. It is measured in units of 32-bit words because all RTCP packets are multiples of 32 bits in length, so counting octets would only allow the possibility of inconsistency. Zero is a valid length, indicating that the packet consists of only the four-octet header (the IC header field will also be zero in this case).

Following the RTCP header is the packet data (the format of which depends on the packet type) and optional padding. The combination of header and data is an RTCP packet. The five standard types of RTCP packets are described in the sections that follow.

RTCP packets are never transported individually; instead they are always grouped together for transmission, forming compound packets. Each compound packet is encapsulated in a single lower-layer packet—often a UDP/IP packet—for transport. If the compound packet is to be encrypted, the group of RTCP packets is prefixed by a 32-bit random value. The structure of a compound packet is illustrated in Figure 5.2.

Figure 5.2. Format of a Compound RTCP Packet

A set of rules governs the order in which RTCP packets are grouped to form a compound packet. These rules are described later in the chapter, in the section titled Packing Issues, after the five types of RTCP packets have been described in more detail.

RTCP RR: Receiver Reports

One of the primary uses of RTCP is reception quality reporting, which is accomplished through RTCP receiver report (RR) packets, which are sent by all participants who receive data.

THE RTCP RR PACKET FORMAT

A receiver report packet is identified by a packet type of 201 and has the format illustrated in Figure 5.3. A receiver report packet contains the SSRC (synchronization source) of the participant who is sending the report (the reporter SSRC) followed by zero or more report blocks, denoted by the RC field.

Figure 5.3. Format of an RTCP RR Packet

Each report block describes the reception quality of a single synchronization source from which the reporter has received RTP packets during the current reporting interval. A total of 31 report blocks can be in each RTCP RR packet. If there are more than 31 active senders, the receiver should send multiple RR packets in a compound packet. Each report block has seven fields, for a total of 24 octets.

The reportee SSRC identifies the participant to whom this report block pertains. The statistics in the report block denote the quality of reception for the reportee synchronization source, as received at the participant generating the RR packet.

The cumulative number of packets lost is a 24-bit signed integer denoting the number of packets expected, less the number of packets actually received. The number of packets expected is defined to be the extended last sequence number received, less the initial sequence number received. The number of packets received includes any that are late or duplicated, and hence may be greater than the number expected, so the cumulative number of packets lost may be negative. The cumulative number of packets lost is calculated for the entire duration of the session, not per interval. This field saturates at the maximum positive value of 0x7FFFFF if more packets than that are lost during the session.

The extended highest sequence number received in the RTP data packets from this synchronization source is calculated as discussed in Chapter 4, RTP Data Transfer Protocol, in the section titled Sequence Number. Because of possible packet reordering, this is not necessarily the extended sequence number of the last RTP packet received. The extended highest sequence number is calculated per session, not per interval.

The loss fraction is defined as the number of packets lost in this reporting interval, divided by the number expected. The loss fraction is expressed as a fixed-point number with the binary point at the left edge of the field, which is equivalent to the integer part after multiplying the loss fraction by 256 (that is, if 1/4 of the packets were lost, the loss fraction would be 1/4 × 256 = 64). If the number of packets received is greater than the number expected, because of the presence of duplicates, making the number of packets lost negative, then the loss fraction is set to zero.

The interarrival jitter is an estimate of the statistical variance in network transit time for the data packets sent by the reportee synchronization source. Interarrival jitter is measured in timestamp units, so it is expressed as a 32-bit unsigned integer, like the RTP timestamp.

To calculate the variance in network transit time, it is necessary to measure the transit time. Because sender and receiver typically do not have synchronized clocks, however, it is not possible to measure the absolute transit time. Instead the relative transit time is calculated as the difference between a packet's RTP timestamp and the receiver's RTP clock at the time of arrival, measured in the same units. This calculation requires the receiver to maintain a clock for each source, running at the same nominal rate as the media clock for that source, from which to derive these relative timestamps. (This clock may be the receiver's local playout clock, if that runs at the same rate as the source clocks.) Because of the lack of synchronization between the clocks of sender and receiver, the relative transit time includes an unknown constant offset. This is not a problem, because we are interested only in the variation in transit time: the difference in spacing between two packets at the receiver versus the spacing when they left the sender. In the following computation the constant offset due to unsynchronized clocks is accounted for by the subtraction.

If S_i is the RTP timestamp from packet i, and R_i is the time of arrival in RTP timestamp units for packet i, then the relative transit time is (R_i – S_i), and for two packets, i and j, the difference in relative transit time may be expressed as

Note that the timestamps, R_x and S_x, are 32-bit unsigned integers, whereas D(i,j) is a signed quantity. The calculation is performed with modulo arithmetic (in C, this means that the timestamps are of type unsigned int, provided that sizeof(unsigned int) == 4).

The interarrival jitter is calculated as each data packet is received, using the difference in relative transit times D(i,j) for that packet and the previous packet received (which is not necessarily the previous packet in sequence number order). The jitter is maintained as a moving average, according to the following formula:

Whenever a reception report is generated, the current value of J_i for the reportee SSRC is included as the interarrival jitter.

The last sender report (LSR) timestamp is the middle 32 bits out of the 64-bit NTP (Network Time Protocol) format timestamp included in the most recent RTCP SR packet received from the reportee SSRC. If no SR has been received yet, the field is set to zero.

The delay since last sender report (DLSR) is the delay, expressed in units of 1/65,536 seconds, between receiving the last SR packet from the reportee SSRC and sending this reception report block. If no SR packet has been received from the reportee SSRC, the DLSR field is set to zero.

INTERPRETING RR DATA

The reception quality feedback in RR packets is useful not only for the sender, but also for other participants and third-party monitoring tools. The feedback provided in RR packets can allow the sender to adapt its transmissions according to the feedback. In addition, other participants can determine whether problems are local or common to several receivers, and network managers may use monitors that receive only the RTCP packets to evaluate the performance of their networks.

A sender can use the LSR and DLSR fields to calculate the round-trip time between it and each receiver. On receiving an RR packet pertaining to it, the sender subtracts the LSR field from the current time, to give the delay between sending the SR and receiving this RR. The sender then subtracts the DLSR field to remove the offset introduced by the delay in the receiver, to get the network round-trip time. The process is shown in Figure 5.4, an example taken from the RTP specification. (Note that RFC 1889 contains an error in this example, which has been corrected in the new version of the RTP specification.)

Figure 5.4. Sample Round-Trip Time (RTT) Computation.

Note that the calculated value is the network round-trip time, and it excludes any processing at the endpoints. For example, the receiver must buffer the data to smooth the effects of jitter before it can play the media (see Chapter 6, Media Capture, Playout, and Timing).

The round-trip time is important in interactive applications because delay hinders interactivity. Studies have shown that it is difficult to conduct conversations when the total round-trip time exceeds about 300 milliseconds⁶¹ (this number is approximate and depends on the listener and the task being performed). A sender may use knowledge of the round-trip time to optimize the media encoding—for example, by generating packets that contain less data to reduce packetization delays—or to drive the use of error correction codes (see Chapter 9, Error Correction).

The fraction lost gives an indication of the short-term packet loss rates to a receiver. By watching trends in the reported statistics, a sender can judge whether the loss is a transient or a long-term effect. Many of the statistics in RR packets are cumulative values, to allow long-term averaging. Differences can be calculated between any two RR packets, making measurements over both short and long periods possible and giving resilience to the loss of reports.

For example, the packet loss rate over the interval between RR packets can be derived from the cumulative statistics, as well as being directly reported. The difference in the cumulative number of packets lost gives the number lost during that interval, and the difference in the extended last sequence numbers gives the number of packets expected during the interval. The ratio of these values is the fraction of packets lost. This number should be equal to the fraction lost field in the RR packet if the calculation is done with consecutive RR packets, but the ratio also gives an estimate of the loss fraction if one or more RR packets have been lost, and it can show negative loss when there are duplicate packets. The advantage of the fraction lost field is that it provides loss information from a single RR packet. This is useful in very large sessions, in which the reporting interval is long enough that two RR packets may not have been received.

Loss rates can be used to influence the choice of media format and error protection coding used (see Chapter 9, Error Correction). In particular, a higher loss rate indicates that a more loss-tolerant format should be used, and that, if possible, the data rate should be reduced (because most loss is caused by congestion; see Chapter 2, Voice and Video Communication over Packet Networks, and Chapter 10, Congestion Control).

The jitter field may also be used to detect the onset of congestion: A sudden increase in jitter will often precede the onset of packet loss. This effect depends on the network topology and the number of flows, with high degrees of statistical multiplexing reducing the correlation between increased jitter and the onset of packet congestion.

Senders should be aware that the jitter estimate depends on packets being sent with spacing that matches their timestamp. If the sender delays some packets, that delay will be counted as part of the network jitter. This can be an issue with video, where multiple packets are often generated with the same timestamp but are spaced for transmission rather than being sent as a burst. This is not necessarily a problem, because the jitter measure still gives an indication of the amount of buffer space that the receiver will require (because the buffer space needs to accommodate both the jitter and the spacing delay).

RTCP SR: Sender Reports

In addition to reception quality reports from receivers, RTCP conveys sender report (SR) packets sent by participants that have recently sent data. These provide information on the media being sent, primarily so that receivers can synchronize multiple media streams (for example, to lipsync audio and video).

THE RTCP SR PACKET FORMAT

A sender report packet is identified by a packet type of 200 and has the format illustrated in Figure 5.5. The payload contains a 24-octet sender information block followed by zero or more receiver report blocks, denoted by the RC field, exactly as if this were a receiver report packet. Receiver report blocks are present when the sender is also a receiver.

Figure 5.5. Format of an RTCP SR Packet

The NTP timestamp is a 64-bit unsigned value that indicates the time at which this RTCP SR packet was sent. It is in the format of an NTP timestamp, counting seconds since January 1, 1900, in the upper 32 bits, with the lower 32 bits representing fractions of a second (that is, a 64-bit fixed-point value, with the binary point after 32 bits). To convert a UNIX timestamp (seconds since 1970) to NTP time, add 2,208,988,800 seconds.

Although the NTP timestamp in RTCP SR packets uses the format of an NTP timestamp, the clock does not have to be synchronized with the Network Time Protocol or have any particular accuracy, resolution, or stability. For a receiver to synchronize two media streams, however, those streams must be related to the same clock. The Network Time Protocol⁵ is occasionally useful for synchronizing the sending clocks, although it is needed only if the media streams to be synchronized are generated by different systems. These issues are discussed further in Chapter 7, Lip Synchronization.

The RTP timestamp corresponds to the same instant as the NTP timestamp, but it is expressed in the units of the RTP media clock. The value is generally not the same as the RTP timestamp of the previous data packet, because some time will have elapsed since the data in that packet was sampled. Figure 5.6 shows an example of the SR packet timestamps. The SR packet has the RTP timestamp corresponding to the time at which it is sent, which does not correspond to either of the surrounding RTP data packets.

Figure 5.6. Use of Timestamps in RTCP SR Packets

The sender's packet count is the number of data packets that this synchronization source has generated since the beginning of the session. The sender's octet count is the number of octets contained in the payload of those data packets (not including the headers or any padding).

The packet count and octet count fields are reset if a sender changes its SSRC (for example, because of a collision). They will eventually wrap around if the source continues to transmit for a long time, but generally this is not a problem. Subtraction of an older value from a newer value will give the correct result if 32-bit modulo arithmetic is used and no more than 2³² counts occurred in between, even if there was a wrap-around (in C, this means that the counters are of type unsigned int, as long as sizeof(unsigned int) == 4). The packet and octet counts enable receivers to calculate the average data rate of the source.

INTERPRETING SR DATA

From the SR information, an application can calculate the average payload data rate and the average packet rate over an interval without receiving the data. The ratio of the two is the average payload size. If it can be assumed that packet loss is independent of packet size, the number of packets received by a particular receiver, multiplied by the average payload size (or the corresponding packet size), gives the apparent throughput available to that receiver.

The timestamps are used to generate a correspondence between media clocks and a known external reference (the NTP format clock). This makes lip synchronization possible, as explained in Chapter 7.

RTCP SDES: Source Description

RTCP can also be used to convey source description (SDES) packets that provide participant identification and supplementary details, such as location, e-mail address, and telephone number. The information in SDES packets is typically entered by the user and is often displayed in the graphical user interface of an application, although this depends on the nature of the application (for example, a system providing a gateway from the telephone system into RTP might use the SDES packets to convey caller ID).

THE RTCP SDES PACKET FORMAT

Each source description packet has the format illustrated in Figure 5.7 and uses RTCP packet type 202. SDES packets comprise zero or more lists of SDES items, the exact number denoted by the SC header field, each of which contains information on a single source.

Figure 5.7. Format of an RTCP SDES Packet

It is possible for an application to generate packets with empty lists of SDES items, in which case the SC and length fields in the RTCP common header will both be zero. In normal use, SC is equal to one (mixers and translators that are aggregating forwarded information will generate packets with larger lists of SDES items).

Each list of SDES items starts with the SSRC of the source being described, followed by one or more entries with the format shown in Figure 5.8. Each entry starts with a type and a length field, then the item text itself in UTF-8 format.¹³ The length field indicates how many octets of text are present; the text is not null-terminated.

Figure 5.8. Format of an SDES Item

The entries in each SDES item are packed into the packet in a continuous manner, with no separation or padding. The list of items is terminated by one or more null octets, the first of which is interpreted as an item of type zero to denote the end of the list. No length octet follows the null item type octet, but additional null octets must be included if needed to pad until a 32-bit boundary is reached. Note that this padding is separate from that indicated by the P bit in the RTCP header. A list with zero items (four null octets) is valid but useless.

Several types of SDES items are defined in the RTP specification, and others may be defined by future profiles. Item type zero is reserved and indicates the end of the list of items. The other standard item types are CNAME, NAME, EMAIL, PHONE, LOC, TOOL, NOTE, and PRIV.

STANDARD SDES ITEMS

The CNAME item (type = 1) provides a canonical name (CNAME) for each participant. It provides a stable and persistent identifier, independent of the synchronization source (because the SSRC will change if an application restarts or if an SSRC collision occurs). The CNAME can be used to associate multiple media streams from a participant across different RTP sessions (for example, to associate voice and video that need to be synchronized), and to name a participant across restarts of a media tool. It is the only mandatory SDES item; all implementations are required to send SDES CNAME items.

The CNAME is allocated algorithmically from the user name and host IP address of the participant. For example, if the author were using an IPv4-based application, the CNAME might be [email protected]. IPv6 applications use the colon-separated numeric form of the address.¹⁶ If the application is running on a system with no notion of user names, the host IP address only is used (with no user name or @ symbol).

As long as each participant joins only a single RTP session—or a related set of sessions that are intended to be synchronized—the use of user name and host IP address is sufficient to generate a consistent unique identifier. If media streams from multiple hosts, or from multiple users, are to be synchronized, then the senders of those streams must collude to generate a consistent CNAME (which typically is the one chosen algorithmically by one of the participants).

The NAME item (type = 2) conveys the participant's name and is intended primarily to be displayed in lists of participants as part of the user interface. This value is typically entered by the user, so applications should not assume anything about its value; in particular, it should not be assumed to be unique.

The EMAIL item (type = 3) conveys the e-mail address of a participant formatted as in RFC 822²—for example, [email protected]. Sending applications should attempt to validate that the EMAIL value is a syntactically correct e-mail address before including it in an SDES item; receivers cannot assume that it is a valid address.

The PHONE item (type = 4) conveys the telephone number of a participant. The RTP specification recommends that this be a complete international number, with a plus sign replacing the international access code (for example, +1 918 555 1212 for a number in the United States), but many implementations allow users to enter this value with no check on format.

The LOC item (type = 5) conveys the location of the participant. Many implementations allow the user to enter the value directly, but it is possible to convey location in any format. For example, an implementation could be linked to the Global Positioning System and include GPS coordinates as its location.

The TOOL item (type = 6) indicates the RTP implementation—the tool—in use by the participant. This field is intended for debugging and marketing purposes. It should include the name and version number of the implementation. Typically the user is not able to edit the contents of this field.

The NOTE item (type = 7) allows the participant to make a brief statement about anything. It works well for a “back in five minutes” type of note, but it is not really suitable for instant messaging, because of the potentially long delay between RTCP packets.

PRIV items (type = 8) are a private extension mechanism, used to define experimental or application-specific SDES extensions. The text of the item begins with an additional single-octet length field and prefix string, followed by a value string that fills the remainder of the item. The intention is that the initial prefix names the extension and is followed by the value of that extension. PRIV items are rarely used; extensions can more efficiently be managed if new SDES item types are defined.

The CNAME is the only SDES item that applications are required to transmit. An implementation should be prepared to receive any of the SDES items, even if it ignores them. There are various privacy issues with SDES (see the section titled Security and Privacy later in this chapter), which means that an implementation should not send any information in addition to the CNAME unless the user has explicitly authorized it to do so.

Figure 5.9 shows an example of a complete RTCP source description packet containing CNAME and NAME items. Note the use of padding at the end of the list of SDES items, to ensure that the packet fits into a multiple of 32 bits.

Figure 5.9. A Sample SDES Packet

PARSER ISSUES

When implementing a parser for SDES packets, you should remember three important points:

The text of SDES items is not null-terminated, implying that manipulating SDES items in languages that assume null-terminated strings requires care. In C, for example, SDES items should be manipulated with strncpy(), which allow strings up to a specified length to be copied (use of strcpy() is inappropriate because the text is not null-terminated). Care-less implementations may be susceptible to buffer overflow attacks, which are a serious security risk.
The text of SDES items is in UTF-8 format; local character sets require conversion before use. It is often necessary to query the locale in use on the system, and to convert between the system character set and UTF-8. Some applications inadvertently generate SDES packets with the wrong character set; an implementation should be robust to this mistake (for example, if the use of an incorrect character set causes the UTF-8 parser to produce an invalid Unicode character).
The text of SDES items may be entered by the user and cannot be trusted to have safe values. In particular, it may contain metacharacters that have undesirable side effects. For example, some user interface scripting languages allow command substitution to be triggered by metacharacters, potentially giving an attacker the means to execute arbitrary code. Implementers should take steps to ensure safe handling of SDES data in their environment.

RTCP BYE: Membership Control

RTCP provides for loose membership control through RTCP BYE packets, which indicate that some participants have left the session. A BYE packet is generated when a participant leaves the session, or when it changes its SSRC—for example, because of a collision. BYE packets may be lost in transit, and some applications do not generate them; so a receiver must be prepared to time out participants who have not been heard from for some time, even if no BYE has been received from them.

The significance of a BYE packet depends, to some extent, on the application. It always indicates that a participant is leaving the RTP session, but there may also be a signaling relationship between the participants (for example, SIP, RTSP, or H.323). An RTCP BYE packet does not terminate any other relationship between the participants.

BYE packets are identified by packet type 203 and have the format shown in Figure 5.10. The RC field in the common RTCP header indicates the number of SSRC identifiers in the packet. A value of zero is valid but useless. On receiving a BYE packet, an implementation should assume that the listed sources have left the session and ignore any further RTP and RTCP packets from that source. It is important to keep state for departing participants for some time after a BYE has been received, to allow for delayed data packets.

Figure 5.10. Format of an RTCP BYE Packet

The section titled Participant Database later in this chapter further describes the state maintenance issues relating to timeout of participants and RTCP BYE packets.

A BYE packet may also contain text indicating the reason for leaving a session, suitable for display in the user interface. This text is optional, but an implementation must be prepared to receive it (even though the text may be ignored).

RTCP APP: Application-Defined RTCP Packets

The final class of RTCP packet (APP) allows for application-defined extensions. It has packet type 204, and the format shown in Figure 5.11. The application-defined packet name is a four-character prefix intended to uniquely identify this extension, with each character being chosen from the ASCII character set, and uppercase and lowercase characters being treated as distinct. It is recommended that the packet name be chosen to match the application it represents, with the choice of subtype values being coordinated by the application. The remainder of the packet is application-specific.

Figure 5.11. Format of an RTCP APP Packet

Application-defined packets are used for nonstandard extensions to RTCP, and for experimentation with new features. The intent is that experimenters use APP as a first place to try new features, and then register new packet types if the features have wider use. Several applications generate APP packets, and implementations should be prepared to ignore unrecognized APP packets.

Packing Issues

As noted earlier, RTCP packets are never sent individually, but rather are packed into a compound packet for transmission. Various rules govern the structure of compound packets, as detailed next.

If the participant generating the compound RTCP packet is an active data sender, the compound must start with an RTCP SR packet. Otherwise it must start with an RTCP RR packet. This is true even if no data has been sent or received, in which case the SR/RR packet contains no receiver report blocks (the RC header field is zero). On the other hand, if data is received from many sources and there are too many reports to fit into a single SR/RR packet, the compound should begin with an SR/RR packet followed by several RR packets.

Following the SR/RR packet is an SDES packet. This packet must include a CNAME item, and it may include other items. The frequency of inclusion of the other (non-CNAME) SDES items is determined by the RTP profile in use. For example, the audio/video profile⁷ specifies that other items may be included with every third compound RTCP packet sent, with a NAME item being sent seven out of eight times within that slot and the remaining SDES cyclically taking up the eighth slot. Other profiles may specify different choices.

BYE packets, when ready for transmission, must be placed as the last packet in a compound. Other RTCP packets to be sent may be included in any order. These strict ordering rules are intended to make packet validation easier because it is highly unlikely that a misdirected packet will meet these constraints.

A potentially difficult issue in the generation of compound RTCP packets is how to handle sessions with larger numbers of active senders. If there are more than 31 active senders, it is necessary to include additional RR packets within the compound. This may be repeated as often as is required, up to the maximum transmission unit (MTU) of the network. If there are so many senders that the receiver reports cannot all fit within the MTU, the receiver reports for some senders must be omitted. In that case, reports that are omitted should be included in the next compound packet generated (requiring a receiver to keep track of the sources reported on in each interval).

A similar issue arises when the SDES items to be included within the packet exceed the maximum packet size. The trade-off between including additional receiver reports and including source description information is left to the implementation. There is no single correct solution.

Sometimes it is necessary to pad a compound RTCP packet out beyond its natural size. In such cases the padding is added to the last RTCP packet in the compound only, and the P bit is set in that last packet. Padding is an area where some implementations are incorrect; the section titled Packet Validation later in this chapter discusses common problems.

Security and Privacy

Various privacy issues are inherent in the use of RTCP—in particular, source description packets. Although these packets are optional, their use can expose significant personal details, so applications should not send SDES information without first informing the user that the information is being made available.

The use of SDES CNAME packets is an exception because these packets are mandatory. The inclusion of an IP address within CNAME packets is a potential issue. However, the same information is available from the IP header of the packet. If the RTP packets pass through Network Address Translation (NAT), the translation of the address in the IP header that is performed should also be performed on the address in the CNAME. In practice, many NAT implementations are unaware of RTP, so there is a potential for leakage of the internal IP address.

The exposure of user names may be a greater concern—in which case applications may omit or rewrite the user name, provided that this is done consistently among the set of applications using CNAME for association.

Some receivers may not want their presence to be visible. It is acceptable if those receivers do not send RTCP at all, although doing so prevents senders from using the reception quality information to adapt their transmission to match the receivers.

To achieve confidentiality of the media stream, RTCP packets may be encrypted. When encrypted, each compound packet contains an additional 32-bit random prefix, as illustrated in Figure 5.12, to help avoid plain-text attacks.

Figure 5.12. Example of an Encrypted RTCP Packet, Showing the Correct Use of Padding

Security and privacy are discussed in more detail in Chapter 13, Security Considerations.

Packet Validation

It is important to validate whether received packets really are RTP or RTCP. The packing rules, mentioned earlier, allow RTCP packets to be rigorously validated. Successful validation of an RTCP stream gives high assurance that the corresponding RTP stream is also valid, although it does not negate the need for validation of the RTP packets.

Listing 5.1 shows the pseudocode for the validation process. These are the key points:

All packets must be compound RTCP packets.
The version field of all packets must equal 2.
The packet type field of the first RTCP packet in a compound packet must be equal to SR or RR.
If padding is needed, it is added to only the last packet in the compound. The padding bit should be zero for all other packets in the compound RTCP packet.
The length fields of the individual RTCP packets must total the overall length of the compound RTCP packet as received.

Because new RTCP packet types may be defined by future profiles, the validation procedure should not require each packet type to be one of the five defined in the RTP specification.

Example 5.1. Pseudocode for Packet Validation

validate_rtcp(rtcp_t *packet, int length)

    rtcp_t    *end  = (rtcp_t *) (((char *) packet) + length);
    rtcp_t    *r    = packet;
    int         l    = 0;
    int         p    = 0;
    // All RTCP packets must be compound packets
    if ((packet->length+ 1) * 4) == length) {
        ... error: not a compound packet
    }
    // Check the RTCP version, packet type, and padding of the first
    // in the compound RTCP packet...
    if (packet->version != 2) {
        ...error: version number != 2 in the first subpacket
    }
    if (packet-> p != 0) {
        ...error: padding bit is set on first packet in compound
    }
    if ((packet->pt != RTCP_SR) && (packet->pt != RTCP_RR)) {
        ...error: compound packet does not start with SR or RR
    }
    // Check all following parts of the compound RTCP packet. The RTP
    // version number must be 2, and the padding bit must be zero on
    // all except the last packet.
    do {
        if (p == 1) {
            ...error: padding before last packet in compound
        }
        if (r-> p) {
            p = 1;
        }
        if (r-> version != 2) {
            ...error: version number != 2 in subpacket
        }
        l += (r->length + 1) * 4;
        r  = (rtcp_t *) (((uint32_t *) r) + r->length + 1);
    } while (r < end);

    // Check that the length of the packets matches the length of the
    // UDP packet in which they were received...
    if ((l != length) || (r != end)) {
        ...error: length does not match UDP packet length
    }
      ...packet is valid
}

One common implementation problem causes packets to fail their validity test: When you're padding compound RTCP packets beyond their natural length, you need to ensure that the padding is added to only the last packet in the compound. A common mistake has been to add the padding to the last packet, but to set the P bit in the header of the first packet in the compound. The P bit must be set only in the last packet.

It is possible to detect RTCP misdirected onto the RTP port via the packet type field. The standard RTCP packets have packet type values with the high bit set; if they are misdirected onto the RTP port, the high bit of the packet type field will fall into the place of the M bit in the RTP header. With the top bit stripped, the standard RTCP packet types correspond to an RTP payload type in the range 72 to 76. This range is reserved in the RTP specification and will not be used for valid RTP data packets, so detection of packets in this range implies that the stream is misdirected. Similarly, RTP packets sent to the RTCP port may clearly be distinguished by their packet type, which will be outside the valid range for RTCP packet types.

Participant Database

Each application in an RTP session will maintain a database of information about the participants and about the session itself. The session information, from which the RTCP timing is derived, can be stored as a set of variables:

The RTP bandwidth—that is, the typical session bandwidth, configured when the application starts.
The RTCP bandwidth fraction—that is, the percentage of the RTP bandwidth devoted to RTCP reports. This is usually 5%, but profiles may define a means of changing this (0% also may be used, meaning that RTCP is not sent).
The average size of all RTCP packets sent and received by this participant.
The number of members in the session, the number of members when this participant last sent an RTCP packet, and the fraction of those who have sent RTP data packets during the preceding reporting interval.
The time at which the implementation last sent an RTCP packet, and the next scheduled transmission time.
A flag indicating whether the implementation has sent any RTP data packets since sending the last two RTCP packets.
A flag indicating whether the implementation has sent any RTCP packets at all.

In addition, the implementation needs to maintain variables to include in RTCP SR packets:

The number of packets and octets of RTP data it has sent.
The last sequence number it used.
The correspondence between the RTP clock it is using and an NTP-format timestamp.

A session data structure containing these variables is also a good place to store the SSRC being used, the SDES information for the implementation, and the file descriptors for the RTP and RTCP sockets. Finally, the session data structure should contain a database for information held on each participant.

In terms of implementation, the session data can be stored simply: a single structure in a C-based implementation, a class in an object-oriented system. With the exception of the participant-specific data, each variable in the structure or class is a simple type: integer, text string, and so on. The format of the participant-specific data is described next.

To generate RTCP packets properly, each participant also needs to maintain state for the other members in the session. A good design makes the participant database an integral part of the operation of the system, holding not just RTCP-related information, but all state for each participant. The per-participant data structure may include the following:

SSRC identifier.
Source description information: the CNAME is required; other information may be included (note that these values are not null-terminated, and care must be taken in their handling).
Reception quality statistics (packet loss and jitter), to allow generation of RTCP RR packets.
Information received from sender reports, to allow lip synchronization (see Chapter 7).
The last time this participant was heard from so that inactive participants can be timed out.
A flag indicating whether this participant has sent data within the current RTCP reporting interval.
The media playout buffer, and any codec state needed (see Chapter 6, Media Capture, Playout, and Timing).
Any information needed for channel coding and error recovery—for example, data awaiting reception of repair packets before it can be decoded (see Chapters 8, Error Concealment, and 9, Error Correction).

Within an RTP session, members are identified by their synchronization source identifier. Because there may be many participants and they may need to be accessed in any order, the appropriate data structure for the participant database is a hash table, indexed by SSRC identifier. In applications that deal with only a single media format, this is sufficient. However, lip synchronization also requires the capability to look up sources by their CNAME. As a result, the participant database should be indexed by a double hash table: once by SSRC and once by CNAME.

Some implementations use less-than-perfect random number generators when choosing their SSRC identifier. This means that a simple hashing function—for example, using the lowest few bits of the SSRC as an index into a table—can lead to unbalanced and inefficient operation. Even though SSRC values are supposed to be random, they should be used with an efficient hashing function. Some have suggested using the MD5 hash of the SSRC as the basis for the index, although that may be considered overkill.

Participants should be added to the database after a validated packet has been received from them. The validation step is important: An implementation does not want to create a state for a participant unless it is certain that the participant is valid. Here are some guidelines:

If an RTCP packet is received and validated, the participant should be entered into the database. The validity checks on RTCP are strong, and it is difficult for bogus packets to satisfy them.
An entry should not be made on the basis of RTP packets only, unless multiple packets are received with consecutive sequence numbers. The validity checks possible for a single RTP packet are weak, and it is possible for a bogus packet to satisfy the tests yet be invalid.

This implies that the implementation should maintain an additional, lightweight table of probationary sources (sources in which only a single RTP packet has been received). To prevent bogus sources of RTP and RTCP data from using too much memory, this table should be aggressively timed out and should have a fixed maximum size. It is difficult to protect against an attacker who purposely generates many different sources to use up all memory of the receivers, but these precautions will prevent accidental exhaustion of memory if a misdirected non-RTP stream is received.

Each CSRC (contributing source) in a valid RTP packet also counts as a participant and should be added to the database. You should expect to receive SDES information for participants identified only by CSRC.

When a participant is added to the database, an application should also update the session-level count of the members and the sender fraction. Addition of a participant may also cause RTCP forward reconsideration, which will be discussed shortly.

Participants are removed from the database after a BYE packet is received or after a specified period of inactivity. This sounds simple, but there are several subtle points.

There is no guarantee that packets are received in order, so an RTCP BYE may be received before the last data packet from a source. To prevent state from being torn down and then immediately reestablished, a participant should be marked as having left after a BYE is received, and its state should be held over for a few seconds (my implementation uses a fixed two-second delay). The important point is that the delay is longer than both the maximum expected reordering and the media playout delay, thereby allowing for late packets and for any data in the playout buffer for that participant to be used.

Sources may be timed out if they haven't been heard from for more than five times the reporting interval. If the reporting interval is less than 5 seconds, the 5-second minimum is used here (even if a smaller interval is used when RTCP packets are being sent).

When a BYE packet is received or when a member times out, RTCP reverse reconsideration takes place, as described in the section titled BYE Reconsideration later in this chapter.

Timing Rules

The rate at which each participant sends RTCP packets is not fixed but varies according to the size of the session and the format of the media stream. The aim is to restrict the total amount of RTCP traffic to a fixed fraction—usually 5%—of the session bandwidth. This goal is achieved by a reduction in the rate at which each participant sends RTCP packets as the size of the session increases. In a two-party telephone call using RTP, each participant will send an RTCP report every few seconds; in a session with thousands of participants—for example, an Internet radio station—the interval between RTCP reports from each listener may be many minutes.

Each participant decides when to send RTCP packets on the basis of the set of rules described later in this section. It is important to follow these rules closely, especially for implementations that may be used in large sessions. If implemented correctly, RTCP will scale to sessions with many thousands of members. If not, the amount of control traffic will grow linearly with the number of members and will cause significant network congestion.

Reporting Interval

Compound RTCP packets are sent periodically, according to a randomized timer. The average time each participant waits between sending RTCP packets is known as the reporting interval. It is calculated on the basis of several factors:

The bandwidth allocated to RTCP. This is a fixed fraction—usually 5%—of the session bandwidth. The session bandwidth is the expected data rate for the session; typically this is the bit rate of a single stream of audio or video data, multiplied by the typical number of simultaneous senders. The session bandwidth is fixed for the duration of a session, and supplied as a configuration parameter to the RTP application when it starts.
The fraction of the session bandwidth allocated to RTCP can be varied by the RTP profile in use. It is important that all members of a session use the same fraction; otherwise state for some members may be prematurely timed out.
The average size of RTCP packets sent and received. The average size includes not just the RTCP data, but also the UDP and IP header sizes (that is, add 28 octets per packet for a typical IPv4 implementation).
The total number of participants and the fraction of those participants who are senders. This requires an implementation to maintain a database of all participants, noting whether they are senders (that is, if RTP data packets or RTCP SR packets have been received from them) or receivers (if only RTCP RR, SDES, or APP packets have been received). The earlier section titled Participant Database explained this in detail.
To guard against buggy implementations that might send SR packets when they have not sent data, a participant that does listen for data should consider another participant to be a sender only if data packets have been received. An implementation that only sends data and does not listen for others' data (such as a media server) may use RTCP SR packets as an indication of a sender, but it should verify that the packet and byte count fields are nonzero and changing from one SR to the next.

If the number of senders is greater than zero but less than one-quarter of the total number of participants, the reporting interval depends on whether we are sending. If we are sending, the reporting interval is set to the number of senders multiplied by the average size of RTCP packets, divided by 25% of the desired RTCP bandwidth. If we are not sending, the reporting interval is set to the number of receivers multiplied by the average size of RTCP packets, divided by 75% of the desired RTCP bandwidth:

If ((senders > 0) and (senders < (25% of total number of participants)) {
    If (we are sending) {
        Interval = average RTCP size * senders / (25% of RTCP bandwidth)
    } else {
        Interval = average RTCP size * receivers / (75% of RTCP bandwidth)
    }
}

If there are no senders, or if more than one-quarter of the members are senders, the reporting interval is calculated as the average size of the RTCP packets multiplied by the total number of members, divided by the desired RTCP bandwidth:

if ((senders = 0) or (senders > (25% of total number of participants)) {
  Interval = average RTCP size * total number of members / RTCP bandwidth
}

These rules ensure that senders have a significant fraction of the RTCP bandwidth, sharing at least one-quarter of the total RTCP bandwidth. The RTCP packets required for lip synchronization and identification of senders can therefore be sent comparatively quickly, while still allowing reports from receivers.

The resulting interval is always compared to an absolute minimum value, which by default is chosen to be 5 seconds. If the interval is less than the minimum interval, it is set to the minimum:

If (Interval < minimum interval) {
    Interval = minimum interval
}

In some cases it is desirable to send RTCP more often than the default minimum interval. For example, if the data rate is high and the application demands more timely reception quality statistics, a short default interval will be required. The latest revision of the RTP specification allows for a reduced minimum interval in these cases:

Minimum interval = 360 / (session bandwidth in Kbps)

This reduced minimum is smaller than 5 seconds for session bandwidths greater than 72 Kbps. When the reduced minimum is being used, it is important to remember that some participants may still be using the default value of 5 seconds, and to take this into account when determining whether to time out a participant because of inactivity.

The resulting interval is the average time between RTCP packets. The transmission rules described next are then used to convert this value into the actual send time for each packet. The reporting interval should be recalculated whenever the number of participants in a session changes, or when the fraction of senders changes.

Basic Transmission Rules

When an application starts, the first RTCP packet is scheduled for transmission on the basis of an initial estimate of the reporting interval. When the first packet is sent, the second packet is scheduled, and so on. The actual time between packets is randomized, between one-half and one and a half times the reporting interval, to avoid synchronization of the participants' reports, which could cause them to arrive all at once, every time. Finally, if this is the first RTCP packet sent, the interval is halved to provide faster feedback that a new member has joined, thereby allowing the next send time to be calculated as shown here:

I = (Interval * random[0.5, 1.5])

if (this is the first RTCP packet we are sending) {
    I *= 0.5
}
next_rtcp_send_time = current_time + I

The routine random[0.5, 1.5] generates a random number in the interval 0.5 to 1.5. On some platforms it may be implemented by the rand() system call; on others, a call such as drand48() may be a better source of randomness.

As an example of the basic transmission rules, consider an Internet radio station sending 128-Kbps MP3 audio using RTP-over-IP multicast, with an audience of 1,000 members. The default values for the minimum reporting interval (5 seconds) and RTCP bandwidth fraction (5%) are used, and the average size of RTCP packets is assumed to be 90 octets (including UDP/IP headers). When a new audience member starts up, it will not be aware of the other listeners, because it has not yet received any RTCP data. It must assume that the only other member is the sender and calculate its initial reporting interval accordingly. The fraction of members who are senders (the single source) is more than 25% of the known membership (the source and this one receiver), so the reporting interval is calculated like this:

Interval = average RTCP size * total number of members / RTCP bandwidth
         =   90 octets       *         2               / (5% of 128 Kbps)
         =   180 octets      /  800 octets per second
         =   0.225 seconds

Because 0.225 seconds is less than the minimum, the minimum interval of 5 seconds is used as the interval. This value is then randomized and halved because this is the first RTCP packet to be sent. Thus the first RTCP packet is sent between 1.25 and 3.75 seconds after the application is started.

During the time between starting the application and sending the first RTCP packet, several receiver reports will have been received from the other members of the session, allowing the implementation to update its estimate of the number of members. This updated estimate is used to schedule the second RTCP packet.

As we will see later, 1,000 listeners is enough that the average interval will be greater than the minimum, so the rate at which RTCP packets are received in aggregate from all listeners is 75% × 800 bytes per second ÷ 90 bytes per packet = 6.66 packets per second. If the application sends its first RTCP packet after, say, 2.86 seconds, the known audience size will be approximately 2.86 seconds × 6.66 per second = 19.

Because the fraction of senders is now less than 25% of the known membership, the reporting interval for the second packet is then calculated in this way:

Interval = receivers * average RTCP size / (75% of RTCP bandwidth)
         =     19    *        90         / (75% of (5% of 128 Kbps))
         =         1710          / (0.75 * (0.05 * 16000 octets/second))
         =         1710          / 600
         =           2.85 seconds

Again, this value is increased to the minimum interval and randomized. The second RTCP packet is sent between 2.5 and 7.5 seconds after the first.

The process repeats, with an average of 33 new receivers being heard from between sending the first and second RTCP packets, for a total known membership of 52. The result will be an average interval of 7.8 seconds, which, because it is greater than the minimum, is used directly. Consequently the third packet is sent between 3.9 and 11.7 seconds after the second. The average interval between packets increases as the other receivers become known, until the complete audience has been heard from. The interval is then calculated in this way:

Interval = receivers * average RTCP size / (75% of RTCP bandwidth)
         =    1000   *      90           / (75% of (5% of 128 Kbps))
         =         90000        / (0.75 * (0.05 * 16000 octets/second))
         =         90000        / 600
         =           150 seconds

An interval of 150 seconds is equivalent to 1/150 = 0.0066 packets per second, which with 1,000 listeners gives the average RTCP reception rate of 6.66 packets per second.

The proposed standard version of RTP⁶ uses only these basic transmission rules. Although these are sufficient for many applications, they have some limitations that cause problems in sessions with rapid changes in membership. The concept of reconsideration was introduced to avoid these problems.

Forward Reconsideration

As the preceding section suggested, when the session is large, it takes a certain number of reporting intervals before a new member knows the total size of the session. During this learning period, the new member is sending packets faster than the “correct” rate, because of incomplete knowledge. This issue becomes acute when many members join at once, a situation known as a step join. A typical scenario in which a step join may occur is at the start of an event, when an application starts automatically for many participants at once.

In the case of a step join, if only the basic transmission rules are used, each participant will join and schedule its first RTCP packet on the basis of an initial estimate of zero participants. It will send that packet after an average of half of the minimum interval, and it will schedule the next RTCP packet on the basis of the observed number of participants at that time, which can now be several hundreds or even thousands. Because of the low initial estimate for the size of the group, there is a burst of RTCP traffic when all participants join the session, and this can congest the network.

Rosenberg has studied this phenomenon¹⁰⁰ and reports on the case in which 10,000 members join a session at once. His simulations show that in such a step join, all 10,000 members try to send an RTCP packet within the first 2.5 seconds, which is almost 3,000 times the desired rate. Such a burst of packets will cause extreme network congestion—not the desired outcome for a low-rate control protocol.

Continually updating the estimate of the number of participants and the fraction who are senders, and then using these numbers to reconsider the send time of each RTCP packet, can solve this problem. When the scheduled transmission time arrives, the interval is recalculated on the basis of the updated estimate of the group size, and this value is used to calculate a new send time. If the new send time is in the future, the packet is not sent but is rescheduled for that time.

This procedure may sound complex, but it is actually simple to implement. Consider the pseudocode for the basic transmission rules, which can be written like this:

if (current_time >= next_rtcp_send_time) {
    send RTCP packet
    next_rtcp_send_time = rtcp_interval() + current_time
}

With forward reconsideration, this changes to the following:

if (current_time >= next_rtcp_check_time) {
  new_rtcp_send_time = (rtcp_interval() / 1.21828) + last_rtcp_send_time
  if (current_time >= new_rtcp_send_time) {
    send RTCP packet
    next_rtcp_check_time = (rtcp_interval() /1.21828) + current_time
  } else {
    next_rtcp_check_time = new_send_time
  }
}

Here the function rtcp_interval() returns a randomized sampling of the reporting interval, based on the current estimate of the session size. Note the division of rtcp_interval() by a factor of 1.21828 (Euler's constant e minus 1.5). This is a compensating factor for the effects of the reconsideration algorithm, which converges to a value below the desired 5% bandwidth fraction.

The effect of reconsideration is to delay RTCP packets when the estimate of the group size is increasing. This effect is shown in Figure 5.13, which illustrates that the initial burst of packets is greatly reduced when reconsideration is used, comprising only 75 packets—rather than 10,000—before the other participants learn to scale back their reporting interval.

The Effect of Forward Reconsideration on RTCP Send Rates (Adapted from J. Rosenberg and H. Schulzrinne, “Timer Reconsideration for Enhanced RTP Scalability,” Proceedings of IEEE Infocom '98, San Francisco, CA, March 1998. © 1998 IEEE.)

Figure 5.13. The Effect of Forward Reconsideration on RTCP Send Rates (Adapted from J. Rosenberg and H. Schulzrinne, “Timer Reconsideration for Enhanced RTP Scalability,” Proceedings of IEEE Infocom '98, San Francisco, CA, March 1998. © 1998 IEEE.)

As another example, consider the scenario discussed in the previous section, Basic Transmission Rules, in which a new listener is joining an established Internet radio station using multicast RTP. When the listener is joining the session, the first RTCP packet is scheduled as before, between 1.25 and 3.75 seconds after the application is started. The difference comes when the scheduled transmission time arrives: Rather than sending the packet, the application reconsiders the schedule on the basis of the current estimate of the number of members. As was calculated before, assuming a random initial interval of 2.86 seconds, the application will have received about 19 RTCP packets from the other members, and a new average interval of 2.85 seconds will be calculated:

Interval = number of receivers * average RTCP size / (75% of RTCP bandwidth)
         =    19  *  90 / (0.75  * (0.05 * 16000 octets/second))
        = 1710/ 600
        = 2.85 seconds

The result is less than the minimum, so the minimum of 5 seconds is used, randomized and divided by the scaling factor. If the resulting value is less than the current time (in this example 2.85 seconds after the application started), then the packet is sent. If not—for example, if the new randomized value is 5.97 seconds— the packet is rescheduled for the later time.

After the new timer expires (in this example 5.97 seconds after the application started), the reconsideration process takes place again. At this time the receiver will have received RTCP packets from approximately 5.97 seconds × 6.66 per second = 40 other members, and the recalculated RTCP interval will be 6 seconds before randomization and scaling. The process repeats until the reconsidered send time comes out before the current time. At that point the first RTCP packet is sent, and the second is scheduled.

Reconsideration is simple to implement, and it is recommended that all implementations include it, even though it has significant effects only after the number of participants reaches several hundred. An implementation that includes forward reconsideration will be safe no matter what size the session, or how many participants join simultaneously. One that uses only the basic transmission rules may send RTCP too often, causing network congestion in large sessions with synchronized joins.

Reverse Reconsideration

If there are problems with step joins, one might reasonably expect there to be problems due to the rapid departure of many participants (a step leave). This is indeed the case with the basic transmission rules, although the problem is not with RTCP being sent too often and causing congestion, but with it not being sent often enough, causing premature timeout of participants.

The problem occurs when most, but not all, of the members leave a large session. As a result the reporting interval decreases rapidly, perhaps from several minutes to several seconds. With the basic transmission rules, however, packets are not rescheduled after the change, although the timeout interval is updated. The result is that those members who did not leave are marked as having timed out; their packets do not arrive within the new timeout period.

The problem is solved in a similar way to that of step joins: When each BYE packet is received, the estimate of the number of participants is updated, and the send time of the next RTCP packet is reconsidered. The difference from forward reconsideration is that the estimate will be getting smaller, so the next packet is sent earlier than it would otherwise have been.

When a BYE packet is received, the new transmission time is calculated on the basis of the fraction of members still present after the BYE, and the amount of time left before the original scheduled transmission time. The procedure is as follows:

if (BYE packet received) {
  member_fraction = num_members_after_BYE / num_members_before_BYE
  time_remaining  = next_rtcp_send_time – current_time
  next_rtcp_send_time = current_time + member_fraction * time_remaining
}

The result is a new transmission time that is earlier than the original value, but later than the current time. Packets are therefore scheduled early enough that the remaining members do not time each other out, preventing the estimate of the number of participants from erroneously falling to zero.

Implementation of reverse reconsideration is a secondary concern: It's an issue only in sessions with several hundred participants and rapid changes in membership, and failing to implement it may result in false timeouts but no networkwide problems.

BYE Reconsideration

In the proposed standard version of RTP,⁶ a member desiring to leave a session sends a BYE packet immediately, then exits. If many members decide to leave at once, this can cause a flood of BYE packets and can result in network congestion (much as happens with RTCP packets during a step join, if forward reconsideration is not employed).

To avoid this problem, the current version of RTP allows BYE packets to be sent immediately only if there are fewer than 50 members when a participant decides to leave. If there are more than 50 members, the leaving member should delay sending a BYE if other BYE packets are received while it is preparing to leave, a process called BYE reconsideration.

BYE reconsideration is analogous to forward reconsideration, but based on a count of the number of BYE packets received, rather than the number of other members. When a participant wants to leave a session, it suspends normal processing of RTP/RTCP packets and schedules a BYE packet according to the forward reconsideration rules, calculated as if there were no other members and as if this were the first RTCP packet to be sent. While waiting for the scheduled transmission time, the participant ignores all RTP and RTCP packets except for BYE packets. The BYE packets received are counted, and when the scheduled BYE transmission time arrives, it is reconsidered on the basis of this count. The process continues until the BYE is sent, and then the participant leaves the session.

As this description suggests, the delay before a BYE can be sent depends on the number of members leaving. If only a single member decides to leave, the BYE will be delayed between 1.026 and 3.078 seconds (based on a 5-second minimum reporting interval, halved because BYE packets are treated as if they're the initial RTCP packet). If many participants decide to leave at once, there may be a considerable delay between deciding to leave a session and being able to send the BYE packet. If a fast exit is needed, it is safe to leave the session without sending a BYE; other participants will time out their state eventually.

The use of BYE reconsideration is a relatively minor decision: It is useful only when many participants leave a session at once, and when the others care about receiving notification that a participant has left. It is safe to leave large sessions without sending a BYE, rather than implementing the BYE reconsideration algorithm.

Comments on Reconsideration

The reconsideration rules were introduced to allow RTCP to scale to very large sessions in which the membership changes rapidly. I recommend that all implementations include reconsideration, even if they are initially intended only for use in small sessions; this will prevent future problems if the tool is used in a way the designer did not foresee.

On first reading, the reconsideration rules appear complex and difficult to implement. In practice, they add a small amount of additional code. My implementation of RTP and RTCP consists of about 2,500 lines of C code (excluding sockets and encryption code). Forward and reverse reconsideration together add only 15 lines of code. BYE reconsideration is more complex, at 33 lines of code, but still not a major source of difficulty.

Correct operation of the reconsideration rules depends to a large extent on the statistical average of the behavior of many individual participants. A single incorrect implementation in a large session will cause little noticeable difference to the behavior, but many incorrect implementations in a single session can lead to significant congestion problems. For small sessions, this is largely a theoretical problem, but as the session size increases, the effects of bad RTCP implementations are magnified and can cause network congestion that will affect the quality of the audio and/or video.

Common Implementation Problems

The most common problems observed with RTCP implementations relate to the basic transmission rules, and to the bandwidth calculation:

Incorrect scaling with the number of participants. A fixed reporting interval will cause traffic to grow linearly with the number of members, eventually far exceeding the amount of audio/video data sent and causing network congestion.
Lack of randomization of the reporting interval. Implementations that use a nonrandom reporting interval have the potential to unintentionally synchronize their reports, causing bursts of RTCP packets that can overwhelm receivers.
Forgetting to include lower-layer overheads in the bandwidth calculations. All packet sizes, when calculating the reporting interval, should include the IP and UDP headers (28 octets, for a typical IPv4-based implementation).
Incorrect use of padding. If padding is needed, it should be added to only the last packet in a compound RTCP packet.

When testing the behavior of an RTCP implementation, it is important to use a range of scenarios. Problems can be found in tests of both large and small sessions, sessions in which the membership changes rapidly, sessions in which a large fraction of the participants are senders and in which few are senders, and sessions in which step joins and leaves occur. Testing large-scale sessions is inherently difficult. If an implementation can be structured to be independent of the underlying network transport system, it will allow the simulation of large sessions on a single test machine.

The IETF audio/video transport working group has produced a document describing testing strategies for RTP implementations,⁴⁰ which may also be useful.

Summary

This chapter has described the RTP control protocol, RTCP, in some detail. There are three components:

The RTCP packet formats, and the means by which compound packets are generated
The participant database as the main data structure for an RTP-based application, and the information that needs to be stored for correct operation of RTCP
The rules governing timing of RTCP packets: periodic transmission, adaptation to the size of the session, and reconsideration

We have also briefly discussed security and privacy issues, which are discussed in depth in Chapter 13, Security Considerations, as well as the correct validation of RTCP packets.

The RTP control protocol is an integral part of RTP, used for reception quality reporting, source description, membership control, and lip synchronization. Correct implementation of RTCP can significantly enhance an RTP session: It permits the receiver to lipsync audio and video, identifies the other members of a session, and allows the sender to make an informed choice of error protection scheme to use to achieve optimum quality.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 5. RTP Control Protocol

Create new playlist

Sign In

Sign Up