Chapter 10. The Media Plane

We have already seen that IP multimedia communications comprise two planes: the signaling plane and the media plane. We dealt with the signaling plane in the chapters dedicated to SIP. Now we will look into the media plane. First we will introduce general concepts related to the media plane, and then we will examine two concrete examples of protocols used in the media plane for IP multimedia services: RTP and MSRP.

Overview of the Media Plane

As has been stated many times already, SIP plays the part of the signaling protocol in IP multimedia communication services. A key aspect is that SIP is used to control multimedia communications irrespective of the actual nature of the session; voice, video, messaging, a game, and so on. This works perfectly well because SIP, in order to deliver its functions, does not need to care about the nature of the session. There is still, at specific moments, such as session creation, the need to exchange session descriptions, which are actually dependent on the nature of the session. This is done using SDP, and the SDP content is carried in SIP messages. So SIP does not need to know about session specifics.

In the media plane, what is needed is a protocol that takes care of the media transport. Someone might think that, as occurs in the signaling plane, there might also be a protocol in the media plane suitable for all the types of media sessions. However, that is not the case. The types of sessions that can be established using SIP are so different that it is impractical to use a single media transport protocol fitting all of them. Let us recall that in a multimedia session, we may have strict real-time media (e.g., voice, video), quasi-real-time media (e.g., instant messaging, a game, whiteboard), or even other types of media (e.g., an image, a file).

All these types of media are quite different in nature, and introduce different requirements for the protocol used to transport them. For instance, in a voice communication, timely delivery is crucial, therefore packet retransmission schemes to cope with transmission errors are not applicable there. If a packet is received that contains an error, it is just discarded, and the receiver builds a sample by interpolating between the adjacent sample values.

On the other hand, in instant messaging, the delay requirements are not so strict, and therefore we can cope with a retransmission mechanism to recover from errors and so always present the original, error-free message to the recipient.

A packet network such as the Internet typically impacts packet transmission in three ways:

  • It introduces delay.

  • It may lose packets.

  • It may deliver packets out of sequence.

Depending on the media type, these aspects may be more or less important, and thus the requirements for the transport protocol are different. For instance, as we said before, delay has quite harmful effects for voice transmission, but may be not so critical in a chess game. On the other hand, for a chess game, the loss of a few packets (e.g., representing a checkmate move) might be very negative, whereas, in the case of voice, it may be not so relevant.

The number of media transport protocols for IP communications may be very large and dependent on the specific application. This is, by the way, an area that is constantly evolving, and we may see new protocol proposals arising in the next months to cover specific types of applications. The most common media transport protocols for IP communications are listed next. SIP can be used to set up media sessions carried by all these protocols:

  • Real-time Transport Protocol (RTP): It is an Internet standard (STD 64, RFC 3550) for the transport of strict real-time data such as voice or video. Virtually all the voice and video over IP deployments nowadays use RTP as the media transport protocol. RTP can also be used for transport of real-time Text over IP (ToIP).[1]

  • Message Session Relay Protocol (MSRP): It is work in progress in the IETF, and covers the transport of messages related to a session. There are already a number of commercial products that implement the MSRP draft for utilization in instant messaging. Additionally, MSRP is also being used for image sharing between mobile devices. The main terminal vendors already support MSRP (version 19 of the draft), thanks to successful interoperability meetings since Q1 2007. See [GSMA_MSRP].

  • Transmission Control Protocol (TCP): Session media can also be transported by TCP for certain applications (e.g., online file transfer). The media session would be negotiated via an SDP exchange, and the session created using SIP. Such a use of TCP is specified in [RFC 4145] and [RFC 4572].

  • T.38 fax transmission over UDP: The ITU T.38 recommendation describes the media transport for sending fax messages over IP networks (FOIP) in real-time.[2] [RFC 3362] defines the “image/t38” media type intended to indicate a T.38 media stream in SDP.

Next we will examine with a bit more detail the first two media protocols: RTP and MSRP.

Real-time Transport Protocol (RTP)

Motivation

Let us consider that we want to exchange strict real-time media such as voice or video in the Internet. We saw that IP networks produce some undesirable effects when carrying traffic. In order to understand what the requirements are for a protocol capable of conveying real-time media, we next analyze how these effects impact real-time traffic.

End-to-End Delay and Packet Loss

End-to-end delay is caused by the processing delay at each endpoint (operating system, codecs, and so on), plus the delay caused by the IP network itself (queuing and processing time in routers, transmission delay, and so on). The end-to-end delay may have a negative impact in the interactivity requirements of IP multimedia communication services in general. More specifically, for real-time media such as voice or video, interactivity requirements result in very little tolerance to delay. Imagine a conversation between John and Alice. John starts talking; when he stops, he waits for Alice’s answer. If the end-to-end delay is large, John may think he was not heard. He will repeat what he said, only to be interrupted by the delayed response from Alice. John and Alice will stop talking, and then commence again simultaneously. In this situation, it is very difficult to maintain the interactivity in the conversation in a natural way.

Table 10.1 shows a qualitative estimation of voice interactivity depending on the one-way delay.

Table 10.1. 

One-Way Delay

Interactivity

Less than 100 ms

Good

Between 100 and 250 ms

Acceptable

Between 250 and 400 ms

Bad

More than 400 ms

No interactivity

Another aspect of transmission in an IP network is packet loss. This is typically produced by congested routers dropping packets. The common approach to recover a packet that was lost by the network is to ask for a retransmission. This implies an extra delay, which, in the case of real-time communications, is unacceptable.

Therefore, the delay requirement for real-time media transport imposes that no end-to-end packet retransmission scheme is used. This requirement rules out the possibility of using TCP as transport protocol for this type of media. At this point of the discussion, we might think that UDP is a good candidate because it does not include retransmissions, implies little overhead on top of the IP protocol, and its checksum and multiplexing services may come handy.

UDP is certainly a possibility to directly transport real-time media, but let us look at other requirements to understand why additional functionalities in addition to the ones offered by UDP are needed.

Out-of-Sequence Delivery

This is another undesirable effect of IP networks. It is due to the fact that IP packets from a source to a destination may go thorough different network paths. If there is a congested router in the path of a packet, it is possible that subsequent packets that traverse different routers might arrive first to the destination.

It is important, in IP communication scenarios, that packets are fed in order to the application, hence there is the need for the receiver to reorder the packets. Some transport-level protocols such as TCP already include services for achieving in-sequence delivery. However, we already saw that TCP is not an option for realtime media. We could always define some protocol on top of UDP that includes a monotonically increasing sequence number in each packet in the session, and have the receiver reorder the packets. As we will see, this is one of the functions of RTP.

Jitter

We already discussed the negative impact of end-to-end delay in the interactivity of a conversation. Actually, delay in an IP network is not constant for all the packets in a session. The time it takes a router to process a packet depends on its congestion situation, and this may vary during the session. The variation in delay is called jitter. So the overall delay introduced by an IP network can be described as composed by a fixed component (L) and a variable component (or jitter, J) that accounts for the delay that is produced in routers due to their congestion state.

D = L + J

Although a big overall delay can cause loss of interactivity, jitter may cause loss of intelligibility. Let us consider, for instance, a voice communication between John and Alice. While John speaks, his voice signal is sampled at a constant rate, coded, packetized, and sent over the network. At the receiver side, the samples are recovered and then sent to the soundcard at a constant rate so that the original voice signal can be fully reconstructed. If a voice sample arrives a bit late after its playback time, then it is useless and needs to be discarded. If jitter is too big, too many samples will have to be discarded, and the voice signal will be unintelligible. That is why jitter is a big issue for real-time communications.

There is, however, a way to reduce the adverse impact of jitter. It relies on using buffers at the receiver. When a packet arrives and the audio content is decoded, instead of playing the voice sample immediately, it is stored in a buffer. After some time (order of milliseconds), the buffer content is sent to the soundcard. By doing this, we have introduced some additional artificial delay, but this has allowed us to compensate for the jitter. The voice samples will now be present in the buffer when they need to be sent to the soundcard even if they arrived with variable delays. This idea is shown in Figure 10.1. In the top diagram, we can see that packet number 3 arrives late, and thus it has to be discarded. In the bottom diagram, a buffer to compensate for the jitter is introduced; we can see that packet 3 is now on time for the playback.

Figure 10.1. 

The bigger the buffer is, the more effective we can be at neutralizing the jitter. However, the buffer implies extra delay, and we saw that overall delay can severely impact the interactivity of the conversation. Therefore, in IP communication services, there is an upper limit to the length of the buffer so as not to impact interactivity. In streaming services, in which media flow is only unidirectional, there is not such an issue with the interactivity; therefore, receivers can accommodate big buffers to better handle jitter. The additional introduced delay is, in these cases, not a big issue. For instance, in a live streaming scenario, if we are watching a soccer match and we see a goal being scored a couple of seconds after it really happened, it does not really happen.

In order to be able to fight against jitter, it is crucial that the receiver can recover the time information of the received signal so as to know at what precise moment it needs to be played. This requirement calls for transporting the timing information associated with each voice packet. In other words, we need a protocol that includes a header to transport such timing information. As we will see, RTP also complies with this requirement.

RTP Overview

RTP is an IETF standard protocol (STD 64, RFC 3550) that provides end-to-end delivery services for data with real-time characteristics, such as voice and video. Among these, it includes sequence numbering and timestamping, which, as we saw previously, are crucial functionalities for transporting real-time media.

RTP defines the concept of RTP session. An RTP session is identified by a transport address, and includes just one type of media. This is different from the concept of SDP session, which included all the media flowing from senders to receivers. Actually, an SDP session may encompass several RTP sessions. One single SDP multimedia session might, for instance, include a voice RTP session plus a video RTP session.[3]

RTP typically runs on top of UDP (Figure 10.2). An RTP packet consists of a header and the payload data. The payload data contains the actual coded voice or video, whereas the header includes information needed to deliver the services that the protocol provides.

Figure 10.2. 

In Figure 10.3, we can see the RTP header. As we already discussed, the sequence number allows the receiver to reconstruct the packet’s sender sequence, whereas the timestamp information allows it to reconstruct the timing produced by the source and remove jitter.

Figure 10.3. 

Other interesting headers are:

  • Payload type (PT): identifies format of the payload—that is, the codec.

  • Synchronization source (SSRC): identifies the source of the IP packets.

RTP was originally conceived to be used in the remit of multicast conferences in the Internet. In this kind of conference, every participant sends real-time data (e.g., voice) to a multicast address, and this data is received by the rest of the participants. In this kind of environment, it is important to be able to identify each of the senders that are transmitting in the same RTP session. That is achieved by the SSRC field. This is of little use in unicast IP multimedia communications, where we already have powerful signaling means to identify media senders.

The payload type field is also quite interesting for multicast conferences because it allows the receivers to be notified about a change in the codec. However, in unicast IP communications, codec changes are communicated using a signaling protocol (SIP).

RTP offers quite generic functionality. Applications that use it may be quite different. An interesting aspect about RTP is that it allows applications to tailor it to their needs. For instance, an application might include modifications or additions to existing headers. Therefore, in order to use RTP with a particular application—for instance, voice or video—we need to have the information about how RTP is tailored to that particular application. That is defined in two companion documents per application. One is the profile specification, and the other is the payload format specification.

Profile Specification

The profile specification defines what aspects of RTP are defined by a particular application (e.g., voice). Examples of possible aspects are the RTP data header, payload types, RTP header extensions, and so forth.

Payload Format Specification

In the RTP header, the payload type (PT) is a field that identifies the payload format. This payload format must be specified elsewhere, in a payload format specification document. The specification includes aspects such as the clock rate or the number of channels.

RTCP

RTP comes together with a lightweight control protocol called Real-time Transport Control Protocol (RTCP). Its primary function is to provide feedback on the quality of the media distribution. For instance, it can report number of lost packets or measured jitter. It is useful in order to diagnose problems or even trigger a codec change. This feedback information is conveyed in particular types of RTCP packets called SR (Sender Report) and RR (Receiver Report). All participants in an RTP session send RTCP reports. Senders send “sender reports,” and receivers send “receiver reports.” An endpoint that both transmits and receives RTP media would send both types of reports.

RTCP is also used to carry a persistent identifier of the RTP source that can be correlated with the SSRC (SSRC identification is not persistent because it changes among sessions). This identifier is called CNAME, and is carried in yet another type of RTCP packet called SDES (Source DEScription). This is a function not really interesting for unicast IP multimedia communications because this type of information is already conveyed in the signaling. Let us remember that RTP was originally conceived to be used in multicast scenarios in which this kind of information makes sense (because these do not use an additional signaling protocol).

Application Examples

Audio/Video

One of the most interesting applications to be run on top of RTP is audio and video transmission. Such an application has a corresponding profile called the Audio Video Profile (AVP) and a payload format specification. Both of them are defined in a combined document, [RFC 3551]. This RFC includes a definition of several possible payload types for audio and video. Some of the most frequent ones are depicted in Table 10.2.

Table 10.2. 

PT

Encoding Name

Media Type

Clock Rate (Hz)

Channels C

0

PCMU[a]

Audio

8.000

1

3

GSM

Audio

8.000

1

4

G723

Audio

8.000

1

8

PCMA[b]

Audio

8.000

1

26

JPEG

Video

Variable

-

31

H261

Video

90.000

-

34

H263

Video

90.000

-

96–127

Dynamic

Audio/Video

-

-

[a] refers to Pulse Code Modulation μ -law

[b] refers to Pulse Code Modulation A-law

Payload types can be static or dynamic. Static payload types are defined with a fixed identification number. By looking at Table 10.2, we can know, for instance, that a payload type = 3 in an RTP header indicates that the payload contains one channel of voice data with GSM encoding and sampled at 8000 Hz.

Dynamic payload types do not have a number statically assigned. The assignment is done in a dynamic way, typically via signaling (for instance, using SDP, as we saw in the previous chapter). Identification numbers between 96 and 127 are allocated to dynamic payload types.

Telephony Tones

DTMF (Dual-Tone Multi-Frequency) tones and telephony signals can also be carried over RTP using a particular payload format defined in [RFC 4733] and [RFC 4734]. The payload format is called “telephone-event.” It does not have a static payload type number; the payload type is established dynamically in the SDP exchange.

The “telephony-event” payload type is considered to be just another audio codec by the endpoints.

Real-time Text

Another application that can be conveyed on RTP is real-time Text over IP (ToIP). This refers to real-time transmission of text in a character-by-character fashion for use in conversational services. It can be considered as a text equivalent to voice-based conversational services. Conversational text is defined in [ITU F.700].

Real-time ToIP has special relevance in the context of communication services for deaf, hard of hearing, or speech-impaired individuals.[4] However, it can also be used by mainstream users. For instance, imagine that John and Alice are engaged in a voice over IP conversation. John is at his mobile phone; he is moving, and enters into a very noisy environment that makes voice communication impractical. He might add a new real-time text media to the conversation while he remains in that environment so that John and Alice can still communicate in real-time with text.

Text conversation session contents are specified in ITU-T Recommendation T.140. [RFC 4103] defines how to transport those contents on RTP. It defines the “text/t140” RTP payload type.

Real-time ToIP sessions can be established with SIP.

Messaging Service Relay Protocol (MSRP)

As we saw in Chapter 9, a series of related instant messages between two or more parties can be viewed as a media session. Such a media session can be negotiated using the SDP offer/response model. The SDP session descriptors for the messaging session would be carried by SIP. SIP is, in fact, not concerned with the nature of the media session; from SIP’s point of view, all media sessions (voice, video, messaging, and so on) are treated in the same way.

The utilization of SIP and SDP to signal messaging sessions allows an enhanced degree of integration with other media types, and a more complete communication experience. For instance, John might want to communicate with Alice. Because he does not know whether Alice has her phone or her IM client with her, he will offer an SDP that contains both messaging and voice. The SDP will be embedded in a SIP INVITE message sent to Alice’s address-of-record: sip:[email protected]. Alice accepts the invitation at her IM client, and the messaging session can start. This is shown in Figure 10.4.

Figure 10.4. 

The media transport protocol used for transmitting a series of related instant messages is called Messaging Service Relay Protocol (MSRP). At the time of writing, the MSRP specification is not yet an RFC. It is covered in just two Internet drafts:

  • [draft-ietf-simple-message-sessions]: This covers the core protocol.

  • [draft-ietf-simple-msrp-relays]: This covers the extensions needed for relays support.

Therefore the MSRP specification is still work in progress, though these drafts are about to be published as RFCs.[5]

Figure 10.5. 

This type of instant messaging service where there exists a conversational exchange of messages with a definite beginning and end is called “session-mode” messaging—as opposed to “page-mode” messaging, which refers to just the transmission of individual instant messages.

In this section, we will look at the MSRP protocol, which represents the media plane component in “session-mode” messaging communication systems. Page-mode messaging can be implemented without the need of a media plane, but just having the signaling plane carry the individual messages. Page-mode messaging will be analyzed in Chapter 16.

The main drawback of carrying messages over the signaling plane appears when messages are big (photos, videos, and so on). The SIP signaling network was not designed to carry large messages, so this type of traffic might impose a severe degradation in its performance. By using MSRP, the user messages are moved to the media plane using a protocol specifically designed for media transport, and thus relieving the SIP network.

Main Features

MSRP is a text-based, connection-oriented protocol for the transmission of instant messages in the context of a session. It sits on top of TCP, and allows the exchange of arbitrary MIME content. Next is a brief description of its main features.

Message Chunking

Instant messages sent using MSRP can be divided into different chunks for transmission. Moreover, a long chunk may be interrupted in mid-transmission, and the remaining content sent in subsequent chunks. This feature is useful in order to ensure fair access to shared transport connections. Each chunk contains a Byte-Range header field that indicates the overall position of the chunk inside the complete instant message.

Message Framing

In order to provide the previous feature, MSRP uses a boundary-based framing mechanism. A unique identifier is used to mark the beginning and the end of each message. The identifier at the end of the message indicates whether there are more chunks to come or whether this chunk was the last one in the message.

MSRP Addressing

MSRP clients are identified by an MSRP URI. The MSRP URIs of the parties (clients) involved in a communication are included in SDP session descriptions that are exchanged using SIP at session creation. In this way, each party can know the address (MSRP URI) of its peer so that message transmission in the media plane can take place.

An MSRP URI is used for two purposes:

  • Identify the IP address (or FQDN) and port against which the media plane TCP connection needs to be established. Once established, the instant messages will be sent over that connection.

  • Identify the MSRP session. Each MSRP URI contains a unique session identifier that allows endpoints to identify the session and correlate it with a specific transport connection.

An example of MSRP URI would be:

MSRP Addressing

MSRP relays are also identified by MSRP URIs.

Reporting

MSRP includes support for a very flexible mechanism for reporting on the outcome of message delivery. An MSRP client can specify what the desired reporting mechanism for the messages in a session is. These mechanisms range from no report at all (neither positive nor negative) to reporting absolutely every success or failure situation during message delivery. The way this is implemented will be explained in the next section.

MSRP Nodes

MSRP defines two types of nodes: MSRP clients and MSRP relays.

An MSRP client is the initial sender or final target of messages and delivery status. MSRP clients constitute the endpoints of the MSRP protocol. See Figure 10.6.

Figure 10.6. 

Between sender and receiver, an MSRP session may go through one or more MSRP relays. MSRP relays are intermediary MSRP entities that forward the messages and delivery status. This is shown in Figure 10.7. Relays are typically used for policy enforcement and firewall/NAT traversal.

Figure 10.7. 

MSRP Message Format

MSRP is organized as requests and responses. There are three types of requests—that is, methods:

  • SEND

  • REPORT

  • AUTH

Unlike what happens with SIP, not every MSRP request has an associated response; more specifically, REPORT requests do not have an associated response, and SEND requests may or may not have a corresponding response, depending on the value of specific header fields in the request.

The SEND request is used in order to deliver a complete instant message or a chunk (a portion of a complete message). It includes a request start line, followed by some headers, the instant message content itself, and an end line. See Figure 10.8.

Figure 10.8. 

The REPORT request is used to confirm the delivery of a complete message. Alternatively, it may be used to confirm the delivery of a chunk or group of chunks received so far.

A REPORT request contains a request start line, some headers, and an end line, but no content, as shown in Figure 10.8.

AUTH requests are sent from clients to relays in order to obtain from them an MSRP URI or list of URIs. Clients can then provide this list of URIs to their peers so as to force incoming messages through the relays whose URI is in the list.

Irrespective of the method, the start line in MSRP requests always contains the name of the protocol, a transaction id, and the name of the method (SEND, REPORT, or AUTH). An example of start line might be:

  • MSRP dhe63iy3 SEND

The end line contains a string of seven hyphens, followed by the transaction id plus a character that indicates whether this is the last chunk (“$”) or if there are more chunks to come (“+”).

Example

  • This end line indicates that the message is the last chunk:- - - - - - - dhe63iy3$

  • This end line indicates that there are more chunks to come:- - - - - dhe63iy3+

The transaction id has two purposes:

  • It is used as a mechanism to frame the MSRP messages (it is a random string that appears both in the start line and in the end line).

  • It is used to correlate requests and responses.

MSRP responses contain a response start line, some headers, and the end line, as shown in Figure 10.8. The start line in responses contains the name of the protocol, transaction id, and a status code. Table 10.3 shows the possible values for the status codes.

Table 10.3. 

Status Code

Description

200

Successful transaction

403

Unintelligible request

408

Action not allowed

413

Receiver wishes the sender to stop sending the particular message

415

Media type not understood

423

Requested parameter out of bounds

481

Indicated session does not exist

501

Request method not understood

506

Request arrived on session already bound to another new connection

MSRP Header Fields

The MSRP Internet draft currently defines just a reduced set of headers, which are described next.

From-Path

The From-Path header field indicates the MSRP URI of the originator of the request or response.

To-Path

The To-Path header field indicates the MSRP URI of the destination of the request or response. Both From-Path and To-Path must be present in all requests and responses.

Message-ID

The Message-ID header field provides a unique identifier for the unit of content that the sender wishes to convey to the recipient. For instance, let us assume that John wants to send an image to Alice. The image file may be split into several chunks that are conveyed in different SEND request; however, all the SEND requests carry the same Message-ID header field value. The Message-ID is also used by to correlate status reports with the original message.

Success-Report and Failure-Report

These two header fields may be present in SEND requests, and are used to determine the reporting scheme that should be used for the messages in the session.

As we will see in the next sections, MSRP supports the concept of relays. In the path from originator to recipient, there may be several relays. When it comes to message acknowledgment, there appear two concepts:

  • Hop-by-hop acknowledgment: This may be done by each hop when receiving the message. It is implemented by sending MSRP responses to requests. Responses contain the “transaction status”

  • End-to-end acknowledgment: This may done by the final recipient of the message when it is processed. It is implemented by sending a REPORT request. REPORT requests contain the “delivery status”

This is illustrated in Figure 10.9.

Figure 10.9. 

The way acknowledgments work can be configured by setting the appropriate values to the Success-Report and Failure-Report header fields. For instance, if John wanted to have both positive and negative acknowledgments for the delivery of a message to Alice, he would set both header field values to “yes.” If, on the other hand, John does not want to receive any acknowledgment whatsoever, he would set both header fields to “no.”

Table 10.4 summarizes the behavior related to reporting based on the possible combinations of values for these two headers. For each possible combination, it is indicated which type of acknowledgments are generated (end-to-end or hop-by-hop) and whether positive, negative, or both.

Table 10.4. 

Success-Report

Failure-Report

Hop-by-Hop Acknowledgments

End-to-End Acknowledgments

Yes

Yes

Positive and negative

Positive and negative

No

Yes

Positive and negative

Only negative

Yes

No

None

Only positive

No

No

None

None

Yes

Partial

None[a]

Positive and negative

No

Partial

None[b]

Only negative

[a] Or negative, if the recipient is unable to process the response.

[b] Or negative, if the recipient is unable to correlate the response.

If no Success-Report header field value is present in the SEND request, it is treated identically to one with a value of “no.”

If no Failure-Report header field value is present in the SEND request, it is treated identically to one with a value of “yes.”

Status

This header field is present in responses and REPORT requests, and indicates the outcome of the message delivery.

Byte-Range

The Byte-Range header field may be present in both SEND and REPORT requests. When present in SEND requests, it identifies the specific chunk of a message being carried by the request. When present in a REPORT, it identifies the specific chunk of a message that is being acknowledged.

MSRP Mode of Operation

The protocol operation is quite simple. First we will consider a situation where there are no MSRP relays.

Operation without Relays

Let us assume that John wants to set up a messaging session with Alice. As part of the messaging session, John wants to send Alice text messages, but also some jpeg photos. Therefore, John will generate a SIP INVITE message that contains the description for the messaging session, including two media types: text and jpeg. The SDP offer will also contain John’s MSRP URI.

Alice will answer the INVITE with a 200 OK response in which she includes an SDP answer. The SDP answer accepts both media types, and also includes her MSRP URI. When John receives the 200 OK, his MSRP client establishes a TCP connection against the IP address and port resolved from Alice’s MSRP URI. Once the TCP connection has been set up, John immediately send the initial MSRP SEND request. This request may or may not already include message content, but it is necessary in order for the recipient to have the assurance that the TCP connection has been established by the party who actually received the SDP. The recipient makes this check by comparing the session id in the MSRP URI in the To-Path of the MSRP SEND request with the session id present in the SDP answer.

Once the initial message has been sent, both John and Alice can exchange messages using the established TCP connection in a conversational fashion. When both parties finish their messaging conversation, one of them—for instance, John—sends a SIP BYE request and closes the session. (See Figure 10.10 in next page)

Figure 10.10. 

We saw in the last section (Table 10.4) that, by setting specific values in the Success-Report and Failure-Report header fields, a client can specify whether responses or REPORT requests need to be sent back to the client or not. As an example, we will show two possible scenarios:

  1. The client sets Success-Report=yes and Failure-Report=yes. The message contains several chunks, and the delivery is successful. Figure 10.11 depicts this situation. The recipient generates a response per chunk, and also a REPORT for the complete message.[6]

    Figure 10.11. 

  2. The client sets Success-Report=no and Failure-Report=no. The message contains just one chunk, and the delivery is successful. No response or REPORT request is generated. Figure 10.12 shows this scenario. This example may reflect situations where system messages such as “the system is going down in 5 minutes” are sent to many people, and we do not want to flood the sender with responses.

    Figure 10.12. 

Operation with MSRP Relays

Let us now tackle the protocol operation when there are MSRP relays in the media path. Let us assume that John wishes to start an instant messaging session with Alice. We will consider that there are two relays in the path:

  • relay 1, which acts on John’s behalf

  • relay 2, which acts on Alice’s behalf

MSRP relays are also identified by MSRP URIs, in the same way as clients. For the purpose of this discussion, the MSRP URIs of the four entities involved will be denoted:

  • msrpA: MSRP URI of John

  • msrp1: MSRP URI of relay 1

  • msrp2: MSRP URI of relay 2

  • msrpB: MSRP URI of Alice

Figure 10.13 shows the message flow for the scenario with relays that is described next.

Figure 10.13. 

In order to use a relay, the MSRP client first opens a TLS[7] connection against it. As part of the TLS procedures, the MSRP client authenticates the MSRP relay. Then the MSRP client sends an AUTH request to the relay. This request is rejected with a challenge for client authentication. The MSRP generates a new AUTH including the credentials. If the request is authenticated, the relay responds with a MSRP 200 OK that includes the MSRP URI of the relay (msrp1) in the Use-Path header field.[8]

At that point, the client generates an SDP offer whose path attribute contains two MSRP URIs: msrp1 and msrpA.

Then the SIP session establishment process takes place. The SDP offer is typically sent in the SIP INVITE request, and the SDP answer received in the SIP 200 OK response.

Once having received the SDP answer, the client reads the MSRP URIs included in the path attribute (msrp2,msrpB), and elaborates a new list of MSRP URI by merging the URI received from relay A in the Use-Path header field of the AUTH response and the set of URIs received in the SDP answer.

list= msrp1,msrp2,msrpB

Then the MSRP client builds an MSRP SEND request setting the To-Path header field value to the previous list, and setting the From-Path header field value to his own MSRP URI. We said in the previous section that the To-Path and From-Path header field contain the recipient and originator, respectively, of the MSRP request/response. In order to support the operation with MSRP relays, these two headers can actually contain lists of MSRP URIs.

The list of URIs in the To-Path header field identifies the MSRP entities that need to be visited by the MSRP requests and responses in order to reach the final target. The rightmost MSRP URI in the To-Path header identifies the final target; the leftmost MSRP URI is the next hop to deliver the request or response.

The list of URIs in the From-Path header field indicates how to get back to the original sender of a request or response. The leftmost MSRP URI in the list identifies the last visited MSRP node; the rightmost URI is the originator of the message.

When a relay forwards a request, it removes its address from the To-Path header, and inserts it as the first URI in the From-Path header.

When an MSRP entity receives a request for which it needs to send a response, the MSRP entity copies the list of URIs in the From-Path header of the request into the To-Path header of the response, and sets the From-Path header of the response to its own URI.

In Figure 10.13, we can see how the From-Path and To-Path headers are modified as the SEND message progresses through the different relays.

Reporting

We saw in the previous section the effect that the Success-Report and Failure-Report header fields have on MSRP reporting mechanisms when no relay was involved. The behavior expressed in Table 10.4 is valid also for a situation with relays. We just need to take into account that:

  • Relays may generate hop-by-hop acknowledgments (i.e., MSRP responses) depending on the values of Success-Report and Failure-Report header fields.

  • If the values in these headers indicate the need for hop-by-hop acknowledgments, the relay will start a transaction timer when forwarding the SEND request to the next hop.

  • REPORT requests are generated by the endpoints, not by relays. An exception to this is those situations where the relay receives a negative response for a SEND request or the transaction timer expires. In those cases, the relay will generate a negative REPORT request and send it directly to the originator of the original message.

Now we will see some examples that illustrate how reporting works in the presence of relays.

  1. The client sets Success-Report=yes and Failure-Report=yes. The message contains just one chunk, and the delivery is successful. This situation is shown in Figure 10.14. There are hop-by-hop acknowledgments, and also and end-to-end confirmation (REPORT) that the message was delivered.

    Figure 10.14. 

  2. The client sets Success-Report = yes and Failure-Report = no. The message contains just one chunk, and the delivery is successful. This situation is shown in Figure 10.15. According to Table 10.4, there are no hop-by-hop acknowledgments, but there exists the end-to-end REPORT request.

    Figure 10.15. 

  3. The client sets Success-Report=yes and Failure-Report=yes. The message contains just one chunk, and the delivery fails in the last hop. The transaction timer expires in relay 2, and it sends back a negative REPORT request to the originator. Figure 10.16 shows this scenario.

    Figure 10.16. 

Detailed MSRP Example

In order to illustrate the concepts learned so far, we will now show the protocol traces of a simple messaging scenario. The scenario does not include any proxies; it is just a simple call between John and Alice with two exchanged messages: “How are you?” and “Fine, thank you.” We will include just the most relevant header fields.

(SIP/SDP session establishment)

John to Alice:

INVITE sip:[email protected] SIP/2.0To: <sip:[email protected]: sip:[email protected]>;tag=9317Call-ID: gd3y8r37z3Content-Type: application/sdpc=IN IP4 sea.comm=message 7881 TCP/MSRP *

a=accept-types:text/plaina=path:msrp://host.sea.com: 7881/geiuf4oi3yr;tcp

Alice to John:

SIP/2.0 200 OKTo: <sip:[email protected]>;tag=3y44From: <sip:[email protected]>;tag=9317Call-ID: gd3y8r37z3Content-Type: application/sdpc=IN IP4 ocean.comm=message 11644 TCP/MSRP *a=accept-types:text/plaina=path:msrp://host.ocean.com:11644/p33deirfwy2;tcp

John to Alice:

ACK sip:alice@biloxi SIP/2.0To: <sip:[email protected]>;tag=3y44From: <sip:[email protected]>;tag=9317Call-ID: gd3y8r37z3

(Message exchange)

John to Alice:

MSRP j4l34uud7 SENDTo-Path: msrp://ocean.com:12763/p33deirfwy2;tcpFrom-Path: msrp://sea.com:7654/geiuf4oi3yr;tcpMessage-ID: 85749983Byte-Range: 1-19/19Content-Type: text/plainHello, how are you?-------j4l34uud7$

Alice to John:

MSRP j4l34uud7 200 OKTo-Path: msrp://sea.com:7654/geiuf4oi3yr;tcpFrom-Path: msrp://ocean.com:12763/p33deirfwy2;tcpByte-Range: 1-19/19-------j4l34uud7$

Alice to John:

MSRP fy4u4uu3i SENDTo-Path: msrp://sea.com:7654/geiuf4oi3yr;tcpFrom-Path: msrp://ocean.com:12763/p33deirfwy2;tcpMessage-ID: 83678263Byte-Range: 1-17/17Content-Type: text/plain

I am fine, thanks-------j4l34uud7$

John to Alice:

MSRP j4l34uud7 200 OKTo-Path: msrp://ocean.com:12763/p33deirfwy2;tcpFrom-Path: msrp://sea.com:7654/geiuf4oi3yr;tcpByte-Range: 1-17/17-------j4l34uud7$

Summary

We have so far described with some detail two protocols that are used at the media level. In the next chapter, we will put into practice some of the concepts learned by building an RTP sender and receiver.



[1] Reader should be aware that the acronym ToIP is sometimes used to mean Telephony over IP. Whether we are referring to Text over IP or Telephony over IP should be clear by the context.

[2] T.38 (real-time FOIP) is not to be confused with T.37, which defines a store and forward mechanism for sending faxes over IP using email as transport (SMTP and TIFF attachments).

[3] Throughout this book, a “session” is, by default, an SDP session. If we want to refer to an RTP session, we will explicitly say “RTP session.”

[4] Generic user requirements for SIP in support of deaf, hard of hearing, and speech-impaired individuals are defined in [RFC 2251].

[5] The status of these Internet drafts at the time of writing is “RFC Editor’s Queue,” which is the status just previous to RFC publication.

[6] The recipient might as well have generated several REPORT requests acknowledging each chunk or the group of chunks received so far.

[7] TLS (Transport Layer Security) is discussed in Chapter 13, dedicated to security.

[8] This header field is present only in AUTH requests.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.227.0.249