Chapter 4. SIP Overview

In this chapter, we will introduce SIP. First of all, we will define some key concepts in the SIP framework, such as sessions and addressing. Then the main functions of the protocol will be presented. After that, we will describe the different SIP entities in the SIP architecture. We will also show a first example of how the SIP entities are involved in a basic SIP call.

What is SIP?

The Session Initiation Protocol is an application-level signaling protocol defined by the IETF in [RFC 3261] for the creation and management of sessions over an IP network. The term “session” refers to the media plane aspect of the communication—that is, to the exchange of media (e.g., voice, video, and so on) among an association of participants.

Sessions can be described using the Session Description Protocol (SDP) defined in [RFC 4566]. In order to create sessions, SIP messages carry SDP session descriptions that allow participants to agree on a set of parameters needed for the multimedia communication, such as transport addresses or media types.

A key aspect here is that SIP, the signaling protocol, is independent of the session being established and of the mechanism used to describe it. SIP provides the way to distribute this information between potential participants in the session.

The Session Description Protocol defines a language to characterize the multimedia session. Some key pieces of information have to be present on a session description:

  1. The types of media in the session

  2. The available for each of the media types

  3. The address (IP and port) where media packets should be sent.

In Chapter 9, we will dive in detail into the Session Description Protocol.

SIP Addressing

In the SIP architecture, users are usually identified using a SIP URI (Universal Resource Identifier). A SIP URI is a type of URI, therefore it complies with the general rules for URIs defined in [RFC 3986]. In general, SIP URIs identify communication resources. They contain enough information to initiate and maintain a communication session with the resource.

Example:

sip:[email protected]

A SIP URI uses the “sip:” scheme, and is composed of two parts separated by the “@” sign. The two parts are:

  • An optional user part, which identifies a particular resource at the host being addressed. In our previous example: john.smith.

  • A host-port part, which identifies the source providing the resource. It may contain a Fully Qualified Domain Name (FQDN) or an IP address plus an optional port value. In our previous example: ocean.com.

Additionally, the SIP URI may contain a number of parameters that affect the request constructed from the URI. URI parameters are added after the host-port part, and are separated by semicolons.

SIP URIs may refer to SIP users and SIP servers. More specifically, a SIP URI may represent:

  • The public identity of a user—that is to say, the global identifier that anyone could use in order to establish a multimedia communication with that user. For instance, the following SIP URI might represent John’s public user identity:

    sip:[email protected]

    That is the identity that John would advertise and include in his business cards.

  • A user at a specific host or location. For example, the following SIP URI represents a user called Peter at a host whose FQDN is lab.computing.ocean.com:

    sip:[email protected]

    The next SIP URI refers to the same user at location 212.34.100.2:

    sip:[email protected]

  • A sip server[1]SIP URIs can also be used to represent SIP servers, as in:

    sip:proxy1.ocean.com or

    sip:193.53.24.3

  • A group of users. For instance, the URI

    sip:[email protected]

    might represent the Human Resources Department in the company Ocean. Whenever someone tries to communicate with that resource, the server responsible for the URI would try all the people in the department until it finds someone who can accept the communication.

  • A serviceA URI can also represent a service, as described in [RFC 3087]. For instance, the URI

    sip:dogs:[email protected]

    might refer to a voice-conference service about dogs. Whoever communicates with that URI is joined into the conferencing system.

It is worth mentioning that there are URIs that point to logical identities (for example, sip:[email protected]), whereas other URIs directly indicate locations (FQDNs or IP addresses)—for example, sip:[email protected] or sip:proxy1.ocean.com.

SIP URIs that point to locations can be directly resolved to the corresponding IP address, port, and transport via DNS (Domain Name System)[2] queries. The “logical” SIP URIs, on the other hand, require that a Location Service is queried to resolve the “logical” SIP URI into a “location” SIP URI, which can then be resolved through DNS mechanisms. Location Services will be explained in Section 4.3.2, “Location of Users.”

Other URIs can be used to identify communication resources—for example, the SIPS URI, which provides secure access to communication resources and implies the utilization of TLS (Transport Layer Security); or the TEL URI, which identifies a resource in the telephone network and is used in interworking scenarios between PSTN (Public Switched Telephone Network) and Internet. SIPS URIs are described in Chapter 14, and TEL URIs are tackled in Chapter 18.

SIP can also use generic URIs to identify resources. This is actually a powerful characteristic of SIP because it would allow the combination of other Internet services, such as email or web, with SIP communication services. For instance, Peter would use Alice’s SIP URI in order to initiate a media session with her. Let us assume that she is unavailable. At that point, a SIP redirection might occur, which could convey, in the SIP signaling, her email address (mailto URI) or the HTTP (HyperText Transfer Protocol) URI of her web page. Peter’s application might then send her an email or start the browser and go to her web page automatically. And all this would require only that Peter know her SIP URI. So SIP URIs have the capability of becoming the single identifier for users in the Internet irrespective of the communication method that will eventually be used.

SIP Functions

Let’s now look more closely into the functions of SIP as a signaling protocol. SIP basically solves two key aspects in IP multimedia communications:

  1. Session setup, modification, and termination

  2. Location of users

Session Setup, Termination, and Modification

As its name implies, one of the main functions of the Session Initiation Protocol is the initiation of multimedia sessions. By using SIP, Alice can signal her desire to engage in a multimedia session with John. Likewise, John can use SIP to signal his acceptance or rejection of the communication. During the session setup, session descriptors are exchanged so that both parties can agree on the crucial parameters for the session.

SIP can also be used to modify session parameters of the ongoing session—for instance, in order to add new media components into the session.

The last SIP function related to session management is session termination. Any of the session participants can use SIP to signal his or her desire to terminate the communication while effectively stopping media transmission and reception.

Let’s look a bit more in detail into the specific SIP functions as part of the session establishment. For that purpose, let’s use an example of a multimedia call between Alice and John. In the context of SIP, the term “IP multimedia call”—or “call,” in short—is a generic term used to refer to a SIP-based IP real-time communication between peers.[3] Therefore, that is the meaning of call that we will use in the subsequent sections in the book.

It is John’s birthday, and Alice wants to wish him a happy day. She wants to talk to him, but would also like to show him the nice cake she is preparing to celebrate the event. So what Alice wants is a multimedia (voice and video) communication with John. Therefore, Alice opens her multimedia-communication application on her PC, introduces the address of John, and presses the call button. John is currently in a work meeting, but he is armed with his IP Multimedia-enabled mobile phone.

In order to set up the exchange of multimedia data between Alice and John (i.e., the session), SIP needs to convey certain control (signaling) information:

  • First of all, if Alice wants to communicate with John, she should signal her desire to communicate with him—in other words Alice should explicitly send John an invitation to participate in a communication. SIP will be responsible for carrying such an invitation from Alice to John.

  • Second, it may take some time for John to respond to such an invitation. In the meantime, we need to give Alice some indication about how the call is progressing. In this case, SIP will convey the progress information back to Alice.

  • Third, if John decides to answer, we need to communicate to Alice his willingness to take the communication. Again, SIP is used to convey to Alice the acceptance by John to take her call.

  • Fourth, some parameters need to be negotiated between the two endpoints before actual delivery of the voice and video takes place. For example, the endpoints need to agree on which codecs for voice and video they will use so as to be sure that the voice samples encoded by the sender will be properly decoded in the receiver. The way SIP can enable this functionality is by distributing the session description between John and Alice.

Once the session has been established, Alice and John start to talk to each other and see each other. At some point in time, Alice may decide that she wants to stop the video communication because she has already shown the cake to John. In such a situation, SIP will be used to modify the session parameters that were negotiated during the establishment phase. Alice’s soft phone will use SIP to send a new session description to John. The new session description will not contain the video component. John agrees to eliminate the video component, and the communication proceeds just with voice.

When, later on in the conversation, John decides to terminate the session (because an important meeting with his boss is about to start), again SIP is used for that purpose.

In Figure 4.1, we can see an example of the call between Alice and John that highlights the concepts and definitions discussed in the previous sections.

Figure 4.1. 

Note on the Usage of SIP in Multicast Conferences

SIP can also invite participants to already-existing sessions such as Internet multicast conferences. In this case, a multimedia conference may be taking place on the Internet—for instance, a rock concert is being multicast. At one point in time, Alice, who is listening to the conference (the concert), decides to invite John. She indicates the characterization of the session in her invitation, and once John has received the invitation, he can “tune” to the multicast conference.

This example above represents the scenario for SIP utilization that was very much in the minds of SIP designers when the first draft of SIP was produced back in 1996. However, SIP is used today predominantly in communication scenarios such as the ones that we described at the beginning of the section. Therefore, that will be the type of scenarios we will be focusing on throughout this book.

Location of Users

We have seen in the last section that the session-initiation request needs to be routed from Alice to John. In an IP network, routing of messages relies on the utilization of IP addresses. However, Alice does not know John’s IP address—all she knows is John’s public identity, expressed as a “logical” SIP URI: sip:john@ ocean.com.

In fact, John might even want to use his application from different terminals: his PC at home, his IMS (Internet Multimedia Subsystem) mobile phone when traveling, or his laptop at work—and all these probably have different IP addresses.

So there is a need to derive John’s location from his SIP public identity. What this highlights is the general problem of user mobility: users are identified by an abstract, “logical” SIP identity irrespective of their location, but in order to route messages to them, it is necessary to derive their “physical” location.

Therefore, what is needed is a system capable of tracking the IP address of the users, mapping it to their public identity, and storing that information in a table. In the process of establishing a new multimedia communication, it will be necessary to query the table containing the mapping in order to derive the right IP address to which the packets should be sent.

One of the main SIP functions is to enable user mobility. To that purpose, SIP defines the registration procedure.[4] Every SIP endpoint that wants to be able to receive multimedia calls has to previously be registered. That is to say, it has to communicate its present location (expressed as a “location” SIP URI), together with its public identity (expressed as a “logical” SIP URI), to its home SIP server (more specifically, to its registrar server), which will then maintain a table with the mapping.

The table might look like this:

Address of Record

Contact Address

sip:[email protected]

sip:[email protected]

...

...

Or like this:

Address of Record

Contact Address

sip:[email protected]

sip:[email protected]

...

...

The table has two columns. The first one contains the public identities, so-called Addresses of Record (AORs). The second column contains the corresponding locations, so-called Contact Addresses. AORs are expressed as “logical” SIP URIs, and Contact Addresses are expressed as “location” SIP URIs.

The registration procedure is shown in Figure 4.2.

Figure 4.2. 

Whenever a call is made to a SIP endpoint, the call will be routed to the endpoint’s associated SIP server, which will query that table and derive the SIP URI representing the location of the destination endpoint. Then it will make one or several DNS queries to finally determine what transport protocol, IP address, and port must be used to deliver the signaling message to John. DNS utilization in the remit of SIP is discussed in Chapter 6. Figure 4.3 shows an example of a call routed to John.

Figure 4.3. 

SIP Entities

The SIP specifications define a number of SIP elements as part of the SIP architecture:

  • User Agents (UAs)

  • Registrars

  • Proxies

  • Back-to-Back User Agents (B2BUAs)

User Agents

A SIP UA comprises two components: a User Agent Client (UAC) and a User Agent Server (UAS). The UAC is responsible for the generation of new SIP requests and the reception of the associated responses. The UAS is responsible for receiving SIP requests and generating the appropriate responses. This is shown in Figure 4.4.

Figure 4.4. 

User Agents are typically located at the SIP endpoints, and the end user can interact with them through a user interface. If we look back to our previous example of a call between Alice and John, the multimedia application running on top of Alice’s PC or John’s mobile device includes a SIP User Agent. When Alice decides to call John, she typically makes use of a Graphical User Interface (GUI) where she can introduce John’s SIP URI. Once John’s address has been introduced, Alice would press a button in the GUI to signal her wish to initiate the call. The User Agent software will detect the button being pressed, and will generate the proper SIP request in order to initiate the call.

The multimedia application that both Alice and John are using does not just have to implement a user interface and handle the signaling. It also needs to handle the user plane traffic—that is, the voice and video (or other media) data. For that reason, the multimedia application needs to incorporate a voice and video tool or whatever tool is necessary in order to handle the desired media. The voice and video tool will have to include software or hardware components that implement the media coding and decoding (codecs).

So, summarizing, an end-user multimedia-communications application is typically made up of four types of components:

  • A SIP UA, responsible for handling the signaling.

  • A set of media tools, each of them specialized in a particular media. Different media components can be combined in the same call.

  • A piece of service logic that typically maintains a state machine and forms the glue that makes the other component work together.

  • A user interface through which the user gains access to the application.

This is represented in Figure 4.5. Here, we can clearly see how the SIP protocol itself is independent of the media session. SIP is not concerned with the type of media session that needs to be established. It just distributes the descriptors for the session. For example, if we wanted to enhance our multimedia application to support whiteboarding (in addition to voice and video), we would need to add another component, the whiteboarding media tool, and integrate it with the user interface, but the SIP User Agent itself would remain unchanged.

Figure 4.5. 

The SIP UA is a critical part of any multimedia-communication application. SIP User Agents can be implemented in very different ways. A SIP UA can be, for instance, a software program running on top of a PC, or it can be implemented as part of a desktop phone, or it can run as an application on a mobile phone.

Software programs that implement a multimedia-communication application are typically called soft phones. The soft-phone concept is quite general, and does not necessarily imply the utilization of SIP as a signaling protocol. There are SIP soft phones, but also soft phones that use other signaling protocols such as H.323 (Packed-Based Multimedia Communications Systems), SCCP (Skinny Client Control Protocol), and so on.

We have said the SIP endpoints include a SIP UA. Additionally, there are situations in which SIP User Agents can also be included in network servers. Take the example of a SIP Voice Mail System (VMS). Such a system has to be able to receive SIP calls, accept them, play a greeting announcement, and record the message from the caller. Moreover, the VMS may also need to create outgoing calls to notify users of new messages and allow the users to directly listen to those messages. So the VMS has to implement a true multimedia-communication application that will include a SIP User Agent in order to handle the signaling and some media tools.

Registrar

A registrar is a server that accepts registration requests from the User Agents. The registration is the process by which a SIP UA communicates its current location along with its externally visible identifier to the registrar server. A SIP UA needs to be registered before it can receive multimedia calls. When the registrar accepts the registration request, it places the received information—that is, the mapping between user location and globally visible identifier—in a database called Location Service.

Location Service

The Location Service is not a SIP entity. As has been mentioned previously, a Location Service is a database that contains a list of mappings between Addresses of Record (AORs), which represent public SIP identities, and Contact Addresses (which represent the user location) for a specific domain. Both AORs and Contact Addresses are expressed as SIP URIs. When a registrar receives a registration request from a UA, it populates the Location Service with the received information. The Location Service is also contacted by proxy servers responsible for a specific domain in order to obtain information about possible locations of the called user.

The interface toward a Location Service is not SIP based. There is not a standardized mechanism to access a Location Service. Some SIP servers may use protocols such as LDAP (Lightweight Directory Access Protocol) or others. In many implementations, the Location Service and the SIP server are implemented on the same system, and the interface between them is internal.

Address of Record

Contact Address

sip:[email protected]

sip:[email protected]

...

...

Figure 4.6 shows the registration procedure involving a SIP User Agent, a SIP registrar, and a Location Service.

Figure 4.6. 

Proxy Servers

A proxy server is an intermediary entity that makes requests on behalf of other clients. It primarily plays the role of routing (SIP routing),[5] which means that its job is to ensure that a request is sent to another entity “closer” to the targeted user. Proxies are also useful for enforcing policy (for example, making sure a user is allowed to make a call). A proxy interprets and, if necessary, rewrites specific parts of a request message before forwarding it.

There may be a set of proxies between UAC and UAS that help to route requests. Two specific types of SIP proxies deserve our attention: outbound and inbound proxies.

Outbound Proxy

An outbound proxy (Figure 4.7) helps the UAs to route outgoing requests. UAs are usually configured to route all their requests to an outbound proxy, which will route the requests for them.

Figure 4.7. 

Inbound Proxy

An inbound proxy (Figure 4.8) is a proxy server that handles incoming requests for an administrative domain. So it basically helps to route incoming requests to the appropriate UA within the domain it is responsible for. When an inbound proxy receives a request for a user belonging to the domain for which that proxy is responsible, the proxy queries the Location Service, determines the contact address of the UA to which this request is directed, and forwards the request to that address.

Figure 4.8. 

It is quite frequent that the call-initiation request, on its way from originator to recipient, traverses an outbound and an inbound proxy. This arrangement is commonly known as the SIP trapezoid, and is depicted in Figure 4.9.

Figure 4.9. 

Inbound proxies and local outbound proxies may be implemented as part of the same system together with the registrar. In this case, they are referred to simply as SIP servers. This is shown in Figure 4.10.

Figure 4.10. 

Forking

There are cases where a user identified by a logical identity (the Address of Record) may have registered several locations. For instance, he may have registered from different terminals with different IP addresses. When a call reaches his inbound proxy targeted to the AOR, the proxy will discover this situation and apply a specific algorithm to try and reach the user among the different locations. Typically, two approaches can be followed:

  • Sequential search: The proxy tries each location, one after the other.

  • Parallel search: The proxy tries all the locations simultaneously.

Figure 4.11 depicts a scenario for sequential search. Figure 4.12 shows parallel search.

Figure 4.11. 

Figure 4.12. 

Proxies are further discussed in Chapter 13.

Redirect Servers

Redirect Servers (Figure 4.13) are User Agent Servers that receive requests from User Agent Clients and generate a specific type of responses to those. These responses always direct the UAC that generated the request to contact an alternate set of URIs.

Figure 4.13. 

Back-to-Back User Agents

A B2BUA is a logical entity that acts as a User Agent to both ends of a SIP call. They are responsible for handling all SIP signaling between both ends of the call, from call establishment to termination. They remain in the call path for the complete duration of the call.

A B2BUA is logically made up of two UAs, which are linked through some kind of logic as shown in Figure 4.14.

Figure 4.14. 

They are typically used as SIP application servers in order to provide enhanced functionality by manipulating signaling in the call or as network interworking entities.

B2BUAs can typically work in two modes: routing or initiating. In routing mode, they receive a session-initiation request, apply certain logic, and create a new call. This is shown in Figure 4.15.

Figure 4.15. 

In initiating mode, they initiate two different calls and maintain the signaling linkage between them. Figure 4.16 depicts this case.

Figure 4.16. 

Summary

In this chapter, we introduced SIP at a high level. Its basic functions and architecture elements were presented. In order to complete our outlook on multimedia-communications fundamentals, we will, in the next chapter, show an overview of how Value-Added Services (VAS) can be built on top of the basic SIP functions.



[1] The role of SIP servers will be explained in later sections.

[2] DNS is an Internet system that is used to associate several types of information (e.g., IP addresses) with meaningful high-level names (so-called domain names).

[3] [RFC 3261] defines “call” as “an informal term that refers to some communication between peers, generally set up for the purposes of a multimedia conversation.”

[4] Registrations constitute a possible way to populate the Location Service, but not the only way. Arbitrary mapping functions can be configured at the discretion of the administrator.

[5] Not to be confused with IP routing, which is the role of IP routers. SIP routing is at a higher level than IP routing (application level versus network level), and uses the SIP URI as the key field to determine the next hop, as opposed to the destination IP address used by IP routers.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.225.98.18