Chapter 8. Security Design in Conferencing

This chapter covers the following topics:

This chapter identifies security threats affecting video conferencing deployments and then recommends methods of protecting video communication from these attacks. To be most effective, a video deployment requires several layers of security to protect against internal and external threats.

However, some layers of the security infrastructure can interfere with video conferencing protocols and prevent those protocols from establishing a connection to endpoints in other enterprises over the public Internet. This chapter describes this issue, known as Network Address Translation (NAT)/firewall traversal.

Finally, the last part of the chapter describes how video endpoints may use standard methods of cryptography to prevent eavesdropping.

Security Fundamentals

When the term security comes up, most people think of encryption. However, security encompasses several important areas of protection. These areas of protection roughly comprise six groups:

  • Confidentiality

  • Availability

  • Authentication

  • Identity

  • Authorization

  • Integrity

Confidentiality between a sender and a receiver means that only the sender and receiver can interpret the data. Two endpoints achieve confidentiality using encryption. To establish an encrypted link, the sender and receiver exchange a cryptographic key in a secure manner, and then each side uses the key to encrypt or decrypt the data stream.

Availability ensures that infrastructure resources are protected from resource depletion from an attacker. Availability requires protection against denial-of-service (DoS) attacks.

Authentication and identity often describe the same concept and may mean two things:

  • An endpoint can authenticate data to prove that the data is valid. An endpoint can authenticate data without authenticating identity. A section later in this chapter reveals how cryptographic hashes can authenticate data.

  • An endpoint can authenticate its identity by presenting cryptographic credentials that prove its identity. As explained later in this chapter, the participants in the connection use either preshared secrets or cryptographic certificates to establish identity.

Authorization is not to be confused with authentication. Authorization maps the authenticated identity (an endpoint or user) to a set of permissions or capabilities allowed for that user. Secure video conferencing systems often implement authentication and authorization with an AAA (authentication, authorization, and accounting) server such as RADIUS.

Integrity allows a receiver to detect whether an attacker has tampered with data while in transit on the network. One of the ways for an endpoint to provide integrity for a data packet is to authenticate the contents of the entire data packet.

Threats

Without measures to ensure the six fundamental security protections, the network infrastructure and endpoints are open to threats from attackers. This section describes several types of threats and actions you can take to mitigate those threats.

Confidentiality Attacks

Without confidentiality, an attacker can listen to the audio and video streams between two endpoints. Hacker tools are available on the Internet for eavesdropping on voice packet data. One of these tools is called VOMIT (Voice Over Misconfigured IP Telephony). VOMIT processes a stream of captured voice packets and plays the audio.

Solution: Apply encryption to the media packets. Vendors of conferencing products are universally adopting the Advanced Encryption Standard (AES) to encrypt media streams. In IP networks, Voice over IP (VoIP) gear typically uses the Real-time Transport Protocol (RTP) to transmit media streams. Secure Real-time Transport Protocol (SRTP) is an extension of RTP that encrypts media streams, defined in IETF standard RFC 3711. See the “Media Encryption” section later in this chapter for details.

Denial-of-Service Attacks

Attacks on availability are called denial-of-service (DoS) attacks. A DoS attack is any attack that disrupts the availability of service to legitimate users and can take several forms:

The following sections describe each of these DoS attacks in more detail.

Depletion of Network Bandwidth

Depletion of network bandwidth attacks involve flooding the host network with enough data to clog the ingress/egress points in the enterprise network. These attacks appear primarily as a flood of UDP packets. Often, these attacks are launched from a large number of external endpoints on the public Internet, in which case they are referred to as distributed denial-of-service (DDoS) attacks.

Solution 1: When a flood attack overwhelms the bandwidth of the connection that links a service provider to an enterprise, the only way to stop the attack is to discard attack packets in the service provider. Service providers typically perform this type of packet shunning with an anomaly detector device and a guard device. The anomaly detector identifies potential attack traffic and instructs the guard to scrub the traffic. The guard pinpoints and discards attack packets before they reach the enterprise network. The Cisco Anomaly Detector product and Cisco Guard product are examples of these devices.

Solution 2: Routers and switches can implement bandwidth rate limiting. Cisco routers and switches offer a feature called microflow policing to limit the bandwidth of data from an attacker. Enterprises use microflow policing to protect server infrastructure, such as a scheduling server, H.323 gatekeeper, Session Initiation Protocol (SIP) proxy, or CallManager. However, this bandwidth-limiting protection is most effective if it is deployed with two strategies:

  • Place a router with microflow policing close to the attacker, such as at the edge of a network. At this location, it is easier for the policing feature to identify attackers.

  • Distribute the microflow policing at several ingress points of the network. A distributed deployment can more easily block a high-bandwidth attack, while at the same time allowing legitimate users to gain access to the resource.

Depletion of Server Resources

DoS attacks do not always involve depleting the bandwidth on a link; instead, DoS attacks can attempt to deplete resources inside a server or endpoint. In certain cases, servers allocate resources when they receive a packet from the network, and the attacker might seek to exhaust these resources by sending a flood of packets to the victim machine. The classic resource depletion attack is the SYN attack, which exploits the TCP protocol. In the TCP protocol, an endpoint requests a TCP connection with a target server by first sending a SYN (synchronize) packet to the server, as shown in Figure 8-1.

Normal TCP Connection Establishment

Figure 8-1. Normal TCP Connection Establishment

The server allocates resources for the TCP connection and then attempts to complete the TCP protocol by sending a response in the opposite direction, consisting of a SYN/ACK packet. The SYN/ACK packet requests a connection with the endpoint and acknowledges receiving the SYN packet. Normally, the client responds with an ACK packet, which acknowledges the SYN/ACK from the server, and the TCP connection can proceed. However, in a SYN attack, the attacking endpoint does not respond with the final ACK, and the connection at the server eventually times out. However, by sending a flood of SYN packets, the attacking endpoint can overload the target machine with resources allocated for these half-open TCP connections.

Solution: A firewall placed in front of the server can implement SYN cookies, as shown in Figure 8-2.

DoS Protection with a SYN Cookie Firewall

Figure 8-2. DoS Protection with a SYN Cookie Firewall

The firewall intercepts the SYN packet and replies directly with a response containing a cookie value. If the originator is valid, the originator sends a final response back to the firewall, along with the cookie. This mechanism is also called TCP intercept because the firewall intercepts the TCP setup messages. The firewall does not retain any state for the half-open connection; instead, the firewall uses the cookie to validate the parameters that arrive in the response from the originating endpoint. When the firewall receives the ACK from the client, the firewall validates the cookie and then allows the connection. The firewall then replays the TCP connection handshake sequence to the server. Using this method, the firewall and the protected server do not store state information about half-open TCP connections. Firewalls or intrusion prevention systems typically implement this functionality; however, hosts may implement SYN cookies directly, too.

Replay Attacks

Another attack that can cause disruption is the replay attack. The attacker begins by sniffing and recording the packets flowing on the network between two entities during a legitimate connection. The attacker then replays these packets to one of the endpoints. The target endpoint may consider this replayed stream to be legitimate and attempt to process the data, resulting in excessive resource consumption.

Solution: Endpoints thwart a replay attack by using cryptographic authentication, along with a time stamp or sequence number. The receiver verifies the authentication and then verifies that the time stamp or sequence number is valid.

Malware

Malware is any type of data that can compromise an endpoint or server. A worm is a type of malware that consists of network packets that cause a server to execute a program. When the worm is running on the machine, the worm can take over the server and cause it to fail.

Solution: Endpoints or servers can use an intrusion prevention system (IPS), which is a standalone network device that identifies malware located in packets and then discards the packets before they reach a host. A host-based IPS (HIPS) is a software-based IPS that resides on the server itself, usually at the kernel level. The HIPS identities malware packets and discards them before a running process receives them.

Connection Hijacking

After two video conferencing endpoints establish a legitimate connection, an attacker might attempt to hijack the connection by impersonating one of the participants by issuing signaling commands to take over the conversation. The attacker might also use this type of spoofing to cause the connection to fail, in which case the attack is also considered a DoS attack.

Solution: Endpoints can thwart connection hijacking by authenticating the signaling messages.

RTP Hijacking

Whereas connection hijacking is a method that attempts to take over the signaling layer of a conversation, RTP hijacking operates at the media layer and is an attempt by an intruder to inject RTP media packets into a conversation. The intruder essentially becomes an additional, unwanted participant.

Solution: Endpoints can thwart RTP hijacking by authenticating the media packets.

Authentication and Identity Attacks

Attackers may compromise authentication or identity to exploit theft of service or man-in-the-middle (MitM) attacks.

Theft of Service

By compromising identity, attackers can perpetrate theft of service or toll fraud. As you learned in the “Connection Hijacking” section, an attacker can impersonate another user and then take over an existing connection. An attacker may also steal services by spoofing another endpoint directly and then attempting a direct connection.

Solution: Authenticate signaling packets and use cryptographic identity.

Man-in-the-Middle Attacks

A MitM attack occurs when an attacker inserts a rogue device between two connected endpoints. The MitM can then listen to packets that flow between the endpoints and can modify packets in transit. The MitM is invisible to the two endpoints, which are unaware of the attack. One way for an attacker to become a MitM is to spoof the identity of each endpoint to the other. Figure 8-3 shows this scenario.

A Man-in-the-Middle Attack Between Two Endpoints

Figure 8-3. A Man-in-the-Middle Attack Between Two Endpoints

The attacker connects to endpoint A and pretends to be endpoint B, and then connects to endpoint B and pretends to be endpoint A. The MitM acts as a router and can observe packets flowing between endpoints A and B, without either endpoint knowing about the attack. This attack can also work if both endpoints use encryption, without authentication; in this case, the MitM sets up an encrypted link with each endpoint. The MitM can decrypt and then re-encrypt each packet that passes through it. The MitM can also inject data into the media stream or change the media stream.

Solution: Use authentication and integrity for each signaling message and media packet.

Network Infrastructure Attacks

In a video conferencing deployment, security of the underlying network infrastructure is just as important as the security applied to the upper-layer conferencing protocols. Network security protects against several attacks, including the following:

The following sections describe each of these network infrastructure attacks.

Reconnaissance

One vulnerability often overlooked is reconnaissance. Before an attacker attempts to compromise a network, the attacker often gathers as much information about the network as possible. Attackers can attempt to use network-scanning tools to obtain the following information:

  • The network topology

  • The list of services running on each server

  • The ports on each server that are open and active

  • The model of hardware running each server

  • The version of software running on each server

Solution: Firewalls prevent attackers on the outside of the firewall from using network-scanning tools to probe the infrastructure on the inside of the firewall.

Layer 2 Attacks

Several attacks are possible at Layer 2, the Ethernet link layer. These attacks often require the attacker to have direct access to the internal network. Layer 2 attacks are extremely virulent because after an attacker compromises Layer 2, all layers above Layer 2 might not detect the attack.

Solution: Add security at Layer 2 within the network. A deployment that implements Layer 2 protection inside the network and Layer 3 firewall protections at the edge achieves layered security. An enterprise that has only firewalls at the edge is considered to be “crunchy on the outside, soft on the inside.” This weakness means that an attacker who penetrates beyond the firewalls at the edge can easily compromise targets inside the network. Layer 2 protections inside the network result in security that is “crunchy on the inside.”

CAM Table Flooding

One Layer 2 exploit is a content-addressable memory (CAM) table flood, which allows an attacker to make a switch act like a hub. A hub forwards all packets to all ports. A switch learns about Ethernet MAC addresses at each of its ports so that it can forward packets only to the port that provides a link to the destination address of the packet. In a heavily switched environment, an attacker receives only packets destined for the attacker. By exploiting a CAM table flood, the attacker can cause the switch to forward all packets to all destinations, allowing the attacker to sniff all traffic.

The mapping of each MAC address to each physical port is contained in the CAM table within the switch. However, the CAM table has a limited number of entries, which means an attacker can cause the table to overflow by sending the switch a flood of Ethernet packets containing random spoofed source addresses. As a result, the switch might discard old, but valid, entries from the table to accommodate the flood of new mappings from the hacker. In this attack mode, the hacker causes the switch to “push out” valid CAM table entries. When a switch attempts to forward a packet, if the MAC address of the packet is not in the CAM table, the switch acts like a hub and forwards the packet to all ports on the switch. Attackers can use CAM table flooding to force a switch to act like a hub, allowing the attacker to sniff packets that would normally go only to a different port.

Solution: Port security is a feature on Cisco switches that limits the number of allowable source MAC addresses per port. Port security can statically assign a list of MAC addresses per port, or it can limit the total number of MAC addresses allowed per port.

ARP Cache Poisoning

When a host attempts to send a packet to an IP address on the same subnet, the originating host must discover the Ethernet MAC address corresponding to the destination IP address. The originating host learns about this mapping by issuing an ARP request packet, which requests the MAC address used by the destination IP address. The destination machine receives this request and responds with an ARP reply that contains the MAC address. The originating host caches this IP-to-MAC address mapping into its local ARP cache. All hosts listen to all ARP reply messages to build up a table of IP/MAC addresses over time. However, at any time, an attacker can issue a gratuitous ARP reply. A gratuitous ARP reply is an ARP reply without an originating ARP request. Machines on the subnet often store the IP-to-MAC mapping for this gratuitous ARP reply in their ARP cache. As a result, an attacker can issue a gratuitous ARP reply that maps the IP address of a victim to the MAC address of the attacker, which causes any packets intended for the victim to instead go to the attacker. The attacker can then become a MitM by forwarding this packet traffic to the victim.

Solution: Cisco switches implement a feature called Dynamic ARP Inspection (DAI). DAI drops ARP replies if the MAC address in the ARP reply does not match the IP address assigned earlier via DHCP. This feature relies on the capability of Cisco switches to snoop DHCP requests and therefore protects only endpoints that obtain an IP address via DHCP.

DHCP Exhaustion

DHCP exhaustion is a Layer 2 attack that also implements a DoS. An attacker sends a flood of DHCP request packets to the DHCP server, each requesting an IP address for a random MAC address. Eventually, the DHCP server runs out of available IP addresses and stops issuing DHCP bindings. This failure means that other hosts on the network cannot obtain a DHCP lease, which causes a DoS.

Solution: Cisco switches implement a feature called DHCP snooping, which places a rate limit on DHCP requests.

Rogue DHCP Servers

DHCP servers can provide not only addresses, but also a wide range of information that endpoints may use. This information includes default DNS servers and a default gateway. An attacker can set up a rogue DHCP server on a subnet to provide bad configuration information to an endpoint. Thus, by using a rogue DHCP server, the attacker can virtually reconfigure endpoints that use DHCP parameters.

Solution: Cisco DHCP snooping also provides a feature that drops DHCP request packets sent to unauthorized DHCP servers, which prevents rogue DHCP servers from issuing DHCP leases.

Endpoint Infrastructure Attacks

Video conferencing endpoints are directly vulnerable to several attacks:

  • Desktop endpoint attacks

  • Firmware attacks

  • Rogue configuration file attacks

The next sections describe each of these attacks.

Desktop Endpoint Attacks

Desktop video conferencing systems that run on PCs are vulnerable to operating system–based exploits:

  • As mentioned in the section “Malware,” a worm can execute a program on a vulnerable machine, causing a DoS attack.

  • As mentioned in the section “Denial of Service,” an attacker can attempt to flood a PC with packets that consume resources.

Solution: A HIPS running on the PC can mitigate operating system vulnerabilities.

Firmware Attacks

Some appliance-based video conferencing endpoints run firmware that users can upgrade. Whenever this upgrade feature is present, there is always a possibility that an attacker could attempt to load a rogue firmware image onto the endpoint. For example, an attacker could attempt to download an older firmware image onto the endpoint that does not have newer security protections.

Solution: Endpoints should load only cryptographically signed firmware, which the endpoint vendor authenticates using a cryptographic hash. The “Secure Hashes” section later in this chapter discusses cryptographic hashes.

Rogue Configuration Files

In addition to firmware upgrades, endpoints may also be vulnerable to rogue configuration files. For example, when a Cisco IP phone boots up, it downloads a configuration file from a TFTP server. This configuration file points the IP phone to a list of trusted CallManagers. By compromising this file, an attacker can direct an IP phone to use a rogue CallManager server.

Solution: The endpoint should use only cryptographically signed configuration files. For instance, the configuration file downloaded by a Cisco IP phone is cryptographically signed to prevent forgery.

Server Attacks

Within a video conferencing deployment, servers may run on PCs. These servers may consist of video conference schedulers, H.323 gatekeepers, SIP proxies, video switches, or CallManager servers. Much like the PC-based endpoints, the operating system on these servers is vulnerable to attack. In addition, these servers often represent a single point of failure, which makes them targets for DoS attacks.

General Port-Based Attacks

Much like PC-based endpoints, servers require protection to thwart network port-based attacks such as malware and DoS attacks.

Solution: You can mitigate against port-based attacks as follows:

  • Use HIPS to detect attacks on the machine.

  • Install a virus scanner on the server.

  • Place a firewall in front of the server. In addition to typical firewall access control lists (ACLs), the administrator can configure the firewall to allow only call control traffic to the servers. Typically, UDP-oriented media traffic does not flow to the servers; that traffic flows only from endpoint to endpoint.

  • Activate rate limiting and microflow policing on the routers and switches that connect to the servers. These rate-limiting features are more effective when placed near potential attackers, such as at the edges of the network; this placement allows legitimate users to connect to the servers, even in the presence of a high-bandwidth DoS attack.

Web Server Vulnerabilities

Video conferencing servers, such as H.323 gatekeepers and SIP proxy servers, often host a web server to provide a user interface. This user interface typically provides two important functions:

  • It allows the administrator to configure the device.

  • It allows users to join conferences and view the status of conferences in progress.

However, web servers in general have a higher level of susceptibility to attack than other services, for two reasons. For the web server to operate properly, firewalls allow external users to send high-bandwidth packet streams to the web server on port 80. Hackers can leverage this open port to take advantage of newly discovered flaws that compromise security.

Also, server machines often use two popular web services: Apache and Microsoft Internet Information Service (IIS). Because these web servers are so common, hackers target these services in an attempt to find new vulnerabilities that might not be detected by a firewall or HIPS.

Solution: The web server should offer strong confidentiality and authentication. The HTTPS protocol provides this mechanism by verifying identity, typically using digital certificates (discussed in the “Public Key Cryptography” section later in the chapter) and by encrypting the communications. Two weaker alternatives exist for authentication:

  • Basic authentication—. This method challenges the user with a username and password. However, basic authentication requires the user to send a password unencrypted, and therefore it is inherently insecure. Basic authentication does not encrypt the communications.

  • HTTP-Digest—. This method challenges the user with a username and password and protects the password using a hashing mechanism (discussed in the “Secure Hashes” section of this chapter). This method is more secure than basic authentication. HTTP-Digest only provides authentication; it does not encrypt the communication link.

Unneeded Services and Insecure Services

The operating system of the server may run additional services, such as an FTP server, Telnet server, TFTP server, and so on. Each of these services opens an active port on the machine. Every active open port represents an additional threat because it provides yet another way for an attacker to compromise the machine.

In addition, some services such as FTP and Telnet are inherently insecure because they send passwords over the network unencrypted, in the clear.

Solution: Harden the operating system by turning off unneeded services that might open ports on the server machine. In particular, enterprise networks should adopt a policy of disallowing inherently insecure services such as FTP and Telnet.

Configuring Basic Security

Figure 8-4 shows a general configuration for video conferencing security. This configuration involves layers of security, with protection both at the edges of the network and inside the network.

Basic Configuration for Video Conferencing Security

Figure 8-4. Basic Configuration for Video Conferencing Security

This topology shows a three-legged firewall. The firewall has connections for the enterprise, the Internet, and a demilitarized zone (DMZ). The DMZ contains servers that are accessible by both the internal network and the public Internet. These servers consist of authoritative DNS servers and call control servers that allow endpoints on the public Internet to connect to endpoints inside the enterprise. The firewall has a relatively loose set of rules to allow internal and external endpoints to connect to servers in the DMZ, but it has a stricter set of rules that protects the interior of the enterprise network from both the DMZ and the public Internet.

In addition, the firewall connection for the inner enterprise network also runs a Network Address Translation (NAT) device. The NAT translates private IP addresses inside the enterprise to public addresses routable on the public Internet. The ability for endpoints inside the network to connect to endpoints outside the network through the NAT and firewall is called NAT/firewall traversal, often abbreviated as NAT/FW. NAT/FW traversal can pose a problem for video conferencing protocols, as you learn later in the “NAT/FW Traversal” section.

The enterprise also has a VPN concentrator that allows remote workers or small remote branch offices to connect through a firewall. Tunneling authenticated virtual private network (VPN) streams from teleworkers through a firewall requires a simple firewall configuration and is highly secure.

Also shown in Figure 8-4 is Layer 2 protection in the form of port security, dynamic ARP inspection, and DHCP snooping, all of which are features of Cisco switches.

The configuration shows three layers of protection for the call control servers: firewalls to allow only call control traffic, microflow policing on the routers to prevent DoS attacks, and a HIPS located on each of the servers to further protect against malware.

Port Usage

Firewalls are designed to block unsolicited signaling and media packets from the outside network. However, firewalls must allow traffic on certain signaling and media ports used by video conferencing gear, and administrators must configure firewalls to open these ports. Therefore, this section covers the ports used by the protocols H.323, SIP, and Skinny Client Control Protocol (SCCP).

H.323 Port Usage

H.323 is a complex protocol that has evolved over time to allow several variations of connection establishment; these variations use different message sequences and ports. In addition, some messages can use either UDP or TCP ports. Certain messages use fixed ports, and other messages may use arbitrary ports negotiated between the endpoints.

H.323 Call Flow

Figure 8-5 shows the call flows for H.323. This diagram shows the original simple call flow specified in H.323v1.

Call Flows for H.323v1

Figure 8-5. Call Flows for H.323v1

The basic H.323v1 case includes the following call flow:

  1. EP1 and the gatekeeper use the Registration, Admission, Status (RAS) protocol to pass high-level connection commands. To discover a gatekeeper on the network, endpoints send the RAS Gatekeeper Request Message (GRQ) to UDP multicast address 224.0.1.41, on port 1718. In the process of defining the H.323 specification, the H.323 standards committee registered port 1718 with the Internet Assigned Numbers Authority (IANA) to be the default port for gatekeeper discovery.

  2. Gatekeepers respond by sending a Gatekeeper Confirm (GCF) message to UDP port 1718. After the endpoint locates a gatekeeper, all further RAS messages switch over to use the IANA-registered UDP port 1719.

  3. The endpoint EP1 registers with its gatekeeper GK1 by sending the RAS Registration Request (RRQ) message to the gatekeeper.

  4. The gatekeeper responds with a RAS Registration Confirm (RCF) message.

  5. When endpoint EP1 initiates a call, it sends a RAS Admission Request (ARQ) message to the gatekeeper to ask permission to connect to a remote endpoint.

  6. Based on locally configured policy, GK1 responds with a RAS Admission Confirm (ACF) message.

  7. EP1 establishes an H.225 connection to EP2 using TCP port 1720. Two endpoints use H.225 to establish a control signaling connection. Because EP1 and EP2 establish direct H.225 links, this mode of H.323 signaling is known as the direct signaling mode. EP1 sends an H.225 Setup message to EP2, requesting a connection.

  8. Before EP2 can complete the connection with a response, it must obtain permission to connect to EP1 by sending a RAS Admission Request (ARQ) message to its local gatekeeper GK2.

  9. If GK2 allows the connection based on locally configured policy, it replies with an ACF message.

  10. EP2 then replies to EP1 by sending EP1 an H.225 Connect message to confirm the connection. In addition, EP1 and EP2 use H.225 to negotiate a port for a new H.245 connection. Because the endpoints negotiate this port at connection time, the port is referred to as an ephemeral port and may have a value in the range 1024–65,535.

  11. EP1 and EP2 then establish an H.245 TCP connection, which they use for low-level signaling. H.245 has no default port and may use any port between 1024 and 65,535, negotiated using the previous H.225 exchange. EP1 and EP2 then exchange H.245 messages, which in turn negotiate the UDP ports to use for RTP and RTCP traffic. RTP and RTCP have no default ports and may use any ephemeral port number between 1024 and 65,535.The endpoints may send additional H.225 or H.245 messages.

  12. EP1 may now send RTP media to EP2.

  13. EP2 may now send RTP media to EP1.

Figure 8-6 shows a more advanced call flow from H.323v4 that permits some of the signaling to use a single port.

Call Flows for H.323v4

Figure 8-6. Call Flows for H.323v4

H.323v4 simplifies the firewall configuration for H.323 endpoint communication by offering a variation of H.323 that tunnels H.245 over an existing open connection:

  • Instead of using the direct signaling mode, this call model uses the Gatekeeper-Routed Call Signaling (GKRCS) mode. In this mode, the H.225 messages pass through the gatekeepers, instead of going directly between EP1 and EP2.

  • If endpoints use GKRCS, they may still create direct H.245 connections. However, this example shows a capability of H.323v4 called the Fast Connect method. Fast Connect sends H.245 information within the H.225 Setup and Connect messages.

  • Even if the endpoints use Fast Connect, they may still establish direct H.245 connections. However, Figure 8-6 shows a mode known as H.245 tunneling: After the initial connection setup is complete, the endpoints tunnel future H.245 messages over the H.225 connection.

H.323 Port Summary

Based on the port usage of H.323, Tables 8-1 and 8-2 list the port configurations needed for a simple firewall configuration for H.323.

Table 8-1. H.323v1 Port Usage

Function

Port

Transport Type

Gatekeeper discovery

Port 1718

UDP

Gatekeeper RAS

Port 1719

UDP

H.225

Port 1720

TCP

H.245

Ephemeral port: 1024–65,535

UDP

RTP and RTCP

Ephemeral port: 1024–65,535

UDP

Table 8-2. H.323v4 Port Usage: Fast Connect + H.245 Tunneling

Function

Port

Transport Type

Gatekeeper discovery

Port 1718

UDP

Gatekeeper RAS

Port 1719

UDP

H.225

Port 1720

TCP

RTP and RTCP

Ephemeral port: 1024–65,535

UDP

Table 8-1 shows the port usage necessary to support four major data streams: RAS, H.225, H.245, and RTP/RTCP.

Table 8-2 shows the port usage necessary to support Fast Connect with three major data streams: RAS, H.225, and RTP/RTCP. In this scenario, endpoints tunnel H.245 messages over the H.225 connection.

Tables 8-1 and 8-2 present a significant problem: A firewall must keep a large range of UDP ports open for the RTP, RTCP, and H.245 packets, which negates the purpose of a firewall. Instead, most firewalls implement a feature called Application Layer Gateway (ALG). In this mode, the firewall inspects the H.323 signaling, snoops the negotiated ephemeral ports, and opens pinholes in the firewall for ephemeral ports used by H.245 (when not tunneled) and RTP/RTCP data. The firewall is stateful because it keeps track of the status of the connection: As soon as the signaling channel (H.225) closes the connection, the firewall closes the pinholes for the other ephemeral ports.

The firewall also implements a timeout: If no media or signaling traverses the firewall for a time period longer than a timeout value, the firewall closes the pinholes.

The firewall must understand all variations of a signaling protocol and must receive updates each time the standard protocol changes or adds new capabilities. Because the firewall is designed to work with any type of H.323 endpoint, the firewall must be constantly tested with many different H.323 endpoint brands, models, and versions. In addition, the firewall must be tested with conference bridges, which use the same signaling as endpoints.

If an ALG firewall is in place, and endpoints use the GKRCS and tunneled mode of H.323, administrators need to statically open only UDP ports 1718, 1719, and 1720; the ALG snoops the signaling and opens other ephemeral ports as needed. In addition, the administrator must statically open port 1718 only between the endpoint and the GK; not between the two endpoints.

H.323 endpoints offering encryption almost always use the H.235 standard. The simpler, more widely adopted version of H.235 encrypts the media packets, but not the signaling. Because the signaling is still in the clear, the firewall can snoop the signaling and open pinholes as necessary.

In cases in which a firewall cannot implement an H.323 ALG, a simpler firewall setup may be used, called a UDP ALG firewall, or simply a stateful firewall. In this mode of operation, the administrator statically opens the fixed signaling ports and lets the firewall dynamically open media ports as needed. One side of the firewall is considered trusted; the other side is considered untrusted. When a device on the trusted side of the firewall sends a UDP media packet to a destination address and port on the untrusted side, the firewall automatically opens a pinhole for UDP media to flow in the reverse direction; this open port is often referred to as a reverse pinhole. The destination port number of this newly opened reverse pinhole is always the same as the source port number of the original outgoing connection. Because endpoints typically set the source port equal to the destination port, the reverse pinhole is also the same as the destination port of the outgoing message.

This constraint means that the return flow of packets from the untrusted side of the firewall to the trusted side must use a destination port number that is the same as the source port number used by the sender on the trusted side of the firewall.

This type of firewall ALG is often called a symmetric pinhole or a bidirectional pinhole. When both endpoints in a video conference use identical port values, the endpoints are said to use symmetric ports. H.323 does not mandate the use of symmetric ports, but most H.323 endpoints follow this convention to traverse UDP ALG firewalls. One of the downsides of using a UDP ALG is that RTP media from the untrusted endpoint are not permitted through the firewall until the trusted endpoint sends RTP media out through the firewall. If the trusted endpoint delays sending RTP media, the inbound media might be clipped if the firewall drops early inbound packets. To facilitate firewall traversal, the endpoint on the trusted side should immediately send media packets to open the reverse pinhole for the external endpoint.

For TCP connections, the firewall allows bidirectional TCP connections that originate from the endpoint on the trusted side of the firewall.

SIP Port Usage

Firewall configuration for SIP is rather simple: Port 5060 (UDP or TCP) carries the SIP signaling. The SIP signaling protocol negotiates the media ports for RTP and RTCP, which are UDP ports in the range of 1024 to 65,535. A firewall with a SIP ALG snoops the signaling and opens the media ports.

However, a SIP ALG does not work with secure SIP. Secure SIP establishes an encrypted signaling channel using Transport Layer Security (TLS) over TCP port 5061. When two endpoints connect using encrypted signaling, the firewall cannot snoop the signaling and must rely on the UDP ALG trusted/untrusted model to open bidirectional reverse pinholes for the UDP media packets.

SCCP Port Usage

SCCP signaling is similar to SIP: The Cisco SCCP protocol uses port 2000 for signaling, and the signaling messages negotiate the media ports for RTP. Cisco firewalls provide ALGs for SCCP. For RTP media, Cisco IP phones use ephemeral UDP ports ranging from 16,384 to 32,768.

The secure SCCP protocol sends signaling messages over an encrypted TLS tunnel on port 2443. In this case, the firewall must use a UDP ALG to open bidirectional reverse pinholes for RTP media.

Preset Port Numbers

Some video conferencing endpoints allow preset port values, which allow the user/administrator to configure the endpoints to use only a small set of fixed port numbers to carry the RTP media. Endpoints often make this feature available in the advanced section of the endpoint user interface. The endpoints use this set of ports when negotiating the ephemeral port number. The administrator can then configure the firewall to permanently allow traffic on this small set of static ports. However, this approach leaves the network open to vulnerabilities if attackers exploit the permanently open ports.

NAT and PAT

Firewalls at the edges of an enterprise often include functionality called Network Address Translation (NAT). One variant of NAT is Port Address Translation (PAT); however, both functions are often generically lumped together as NATP or simply NAT.

The NAT functionality is often part of the firewall and is therefore sometimes referred to as a NAT/FW. The NAT device translates the private IP addresses inside the enterprise into public IP addresses visible on the public Internet. Endpoints inside the enterprise are internal endpoints, and endpoints in the public Internet are external endpoints. For example, devices inside the enterprise might have private IP addresses in the form 10.0.x.x. When a device inside the enterprise connects out through the NAT, the NAT dynamically assigns a public IP address in the form 128.56.74.x. This public IP address is referred to as the public mapped address or the reflexive transport address. When the NAT forwards this packet to a device on the public Internet, the packet appears to come from 128.56.74.x. When external devices send packets back to the NAT at address 128.56.74.x, the NAT translates the IP addresses back to the internal private addresses and then forwards the packet to the internal network.

PAT is a variant of NAT. In this scenario, the NAT reuses the same external mapped address for multiple internal endpoints but varies the source port to differentiate among the data streams. PAT has the same considerations as NAT.

NATs offer several capabilities:

  • NATs map a large set of internal, private IP addresses into a smaller set of external, public IP addresses. The current public IPv4 address space is limited, and until IPv6 emerges as a ubiquitous protocol, most enterprises will have a limited number of IPv4 public addresses available. The NAT allows an enterprise with a large number of endpoints to make use of a small pool of public IP addresses. The NAT implements this functionality by dynamically mapping an internal IP address to an external IP address at the time an internal endpoint makes a connection out through the NAT. Each of these mappings is called a NAT binding.

  • NATs provide topology hiding. Because of the address mapping, entities on the public Internet are unaware of the internal, private IP addresses inside the enterprise; external endpoints see only the public mapped source address of a packet.

  • In addition, some NATs can use a different mapping each time a device inside the enterprise makes an outgoing connection to a different external endpoint. In this case, the NAT may provide a different public-to-private mapping for the duration of the new connection, which means that an internal endpoint appears to have two different public IP addresses at the same time, one for each external endpoint. Such obfuscation helps thwart attackers trying to perform reconnaissance.

In addition, a NAT has a notion of trusted and untrusted interfaces, much like a firewall: The NAT creates a binding only if a device on the inside of the enterprise sends a packet to an address on the public Internet. After the NAT creates this binding, it opens a reverse pinhole that allows the device on the public Internet to send packets back to the device on the inside of the NAT. The binding times out after the internal endpoint discontinues sending data for a certain period of time. The binding remains open only if the device on the inside of the NAT continually sends packet out through the NAT to the external endpoint on the public Internet.

NAT Classifications

A NAT is classified by two attributes:

  • Mapping characteristics—. How the NAT allocates a new external mapped address/port for an internal private address/port

  • Filtering characteristics—. How the NAT determines whether to forward a packet from the public address space to the private address space after the NAT creates a binding

For any of these mapping characteristics and filtering modes, the following sequence of events occurs to create a NAT binding:

  1. An internal endpoint with source address Ai uses a source port Pi to send a packet to an external endpoint. The combination of source address and port is denoted using the notation Ai:Pi.

  2. In response to this packet, the NAT sets up a binding and creates an external public mapped address Am and source port Pm for the internal endpoint. This combination of mapped address and port is denoted using the notation Am:Pm.

  3. For each UDP or TCP packet, the NAT replaces the private source address Ai:Pi in the packet with the mapped address Am:Pm before forwarding the packet to the external destination.

NAT Mapping Characteristics

The mapping characteristic of a NAT describes how the NAT allocates external addresses Am:Pm, based on the internal source address Ai:Pi. The NAT may implement two main types of mapping:

  • Endpoint-independent mapping

  • Endpoint-dependent mapping

The internal endpoint may send packets with source address Ai:Pi to multiple external endpoints, each with different addresses.

Figure 8-7 shows a NAT that implements endpoint-independent mapping. In this case, the NAT uses the same external mapped address Am:Pm for packets destined for different external endpoints.

Endpoint-Independent Mapping

Figure 8-7. Endpoint-Independent Mapping

In contrast, Figure 8-8 shows a NAT that implements endpoint-dependent mapping; the NAT allocates different addresses Am1:Pm1 and Am2:Pm2 for different destination endpoints.

Endpoint-Dependent Mapping

Figure 8-8. Endpoint-Dependent Mapping

NATs that implement endpoint-independent mapping have an advantage in a video conferencing environment. A later section in the chapter, “STUN,” describes how an internal endpoint can determine its public mapped address by communicating with a special server in the public address space called a STUN server. After discovering this public address, the internal endpoint can use it when communicating with other public endpoints, but only if the NAT implements endpoint-independent mapping.

NAT Filtering Characteristics

In addition to the mapping characteristics of a NAT, the other quality is the filtering mechanism, which determines whether a NAT allows an inbound packet to traverse the NAT. NATs may display three main types of filtering characteristics:

Endpoint-Independent Filtering

Figure 8-9 shows a NAT that uses endpoint-independent filtering.

Endpoint-Independent Filtering

Figure 8-9. Endpoint-Independent Filtering

Figure 8-9 includes the following addresses that appear on the internal private network:

  • Ai:Pi—. The source address:port of packets from the internal endpoint

  • Ae:Pe—. The destination address:port of packets from the internal endpoint

Figure 8-9 also includes the following addresses that appear on the public network:

  • Am:Pm—. The source address:port of packets from the NAT to endpoints on the public Internet.

  • Ae:Pe—. The source address:port of packets from the external Endpoint 1 to the NAT. This source address:port uses a port Pe that is the same as the port Pe used as the destination for packets from the NAT to external Endpoint 1.

  • Ae:Px—. The source address:port of packets from the external Endpoint 1 to the NAT. This source address:port uses a port Px that differs from the port Pe used as the destination for packets from the NAT to external Endpoint 1.

  • Ao:Po—. The source address:port of packets from the external Endpoint 2 to the NAT.

When the NAT receives a packet with source address:port Ai:Pi and destination address:port Ae:Pe, the NAT creates a public mapped address Am:Pm. The NAT uses Am:Pm as the source address for the packets forwarded through the NAT to the public address space. After the NAT creates the binding, it forwards a packet from the external network to the internal network if the packet meets one condition: The destination address:port of the packet must be Am:Pm.

This mode has the fewest restrictions. After an internal endpoint sends a packet out through the NAT, any external endpoint can use that binding, because the return packet using that binding may have any source address:port.

Address-Dependent Filtering

Figure 8-10 shows a NAT that implements address-dependent filtering. This type of NAT is also referred to simply as a restricted NAT. Figure 8-10 uses the same address:port examples as Figure 8-9.

Address-Dependent Filtering

Figure 8-10. Address-Dependent Filtering

The internal endpoint with source address Ai:Pi sends a packet to an external endpoint with destination address Ae:Pe. The NAT creates a public mapped address Am:Pm. In addition, after the NAT creates this binding, the NAT forwards a packet from the external network to the internal network if

  • The source address of the packet is Ae. However, the source port can be any port.

  • The destination address:port of the packet is Am:Pm.

In this mode, only the external endpoint that received an outbound packet may send a packet back to the internal endpoint. However, the external endpoint can send a packet from any of its source ports.

Address- and Port-Dependent Filtering

Figure 8-11 shows a NAT that implements address- and port-dependent filtering.

Address- and Port-Dependent Filtering

Figure 8-11. Address- and Port-Dependent Filtering

After the NAT creates the binding, it forwards a packet from the external network to the internal network if

  • The source address:port of the packet is Ae:Pe

  • The destination address:port of the packet is Am:Pm

In this case, only the endpoint that received the packet can send a packet back to the internal network, and the packet must have a source port equal to the destination port of the external endpoint.

The Symmetric NAT

A symmetric NAT implements a particular combination of mapping and filtering: endpoint-dependent mapping, along with address- and port-dependent filtering. Figure 8-12 shows a symmetric NAT.

Symmetric NAT

Figure 8-12. Symmetric NAT

Instead of allocating a static mapped address:port for each unique internal endpoint, the NAT allocates a unique Am:Pm for bindings created by packets with different external destination addresses, even when the packets come from the same internal endpoint. In the figure, the two external mapped address consist of the following:

  • Am1:Pm1—. The source address:port of packets from the NAT to Endpoint 1 on the public Internet

  • Am2:Pm2—. The source address:port of packets from the NAT to Endpoint 2 on the public Internet

In addition, the NAT forwards a packet from external to internal networks only if it meets the following criteria:

  • The source address:port of the packet is the same as the original destination of the packet that created the binding.

  • The destination address:port of the packet is the public mapped address associated with the external endpoint.

Most large businesses use symmetric NATs, which have the most restrictive policy.

NAT Complications for VoIP Protocols

NAT presents multiple problems for video conferencing and VoIP protocols, such as the following:

  • External endpoints cannot connect to an internal endpoint in the private address space until the internal endpoint creates a NAT binding by sending packets to the external endpoint. In other words, internal endpoints may not receive unsolicited connections. Of course, this restriction may be considered a security feature. However, one of the goals of NAT traversal is to allow authorized external endpoints to connect to internal endpoints.

  • Several of the video conferencing protocols include source addresses/ports in the protocol signaling messages. These source addresses provide the destination addresses that remote endpoints should use for return packets. However, internal endpoints use addresses from the private address space, and a NAT without an ALG does not alter these internal addresses. When the remote endpoint receives a message, it cannot route packets to the private IP address in the message.

  • NAT bindings time out when the internal endpoint fails to send a packet through the NAT before the NAT timeout expires. Some NATs enforce timeouts as short as one minute.

  • NATs allow secure TLS signaling to traverse through them. However, NATs may have problems with IPsec. IPsec is a protocol that encrypts packets at the IP layer. Native IPsec tunnels cannot traverse a NAT because IPsec requires IP addresses and ports in the IP header to stay the same. To allow IPsec to traverse a NAT, endpoints must tunnel the IPsec packets over UDP. This method of NAT traversal is NAT-T (NAT-Traversal in the IKE). RFC 3947 defines the key exchange method, and RFC 3948 defines the method of UDP encapsulation. Administrators can configure this mode of NAT tunneling, but it requires more configuration management overhead.

  • If two endpoints are behind the same NAT, most commercially available NATs do not allow each endpoint to make a hairpin connection out the NAT and then back into the NAT to the other endpoint. In this scenario, the endpoints must recognize that they are on the same private LAN and use private addresses to establish a direct connection, instead of connecting out the NAT and using public addresses.

In addition, when two endpoints in different enterprises are each behind their own NAT, unusual corner cases may result. Figure 8-13 shows a scenario in which each endpoint has the same private IP address of 10.0.1.1. If these endpoints were to exchange messages containing internal private addresses, they would each attempt to use a remote destination address equal to their own address.

Two Endpoints in Different Enterprises, Each Behind a NAT

Figure 8-13. Two Endpoints in Different Enterprises, Each Behind a NAT

NAT ALGs

In addition to mapping the IP addresses in the IP packet headers, NATs may use ALGs to inspect IP addresses/ports inside the protocol headers of signaling messages, and map them as well. A NAT ALG is similar to a firewall ALG, but a NAT ALG actually changes (maps) the addresses/ports in the signaling messages. This address rewrite is called a fixup or a deep-packet rewrite. Much like firewall ALGs, NAT ALGs suffer from the following problems:

  • The administrator must be sure to upgrade the NAT firmware to understand the latest version of video conferencing protocols.

  • The NAT cannot inspect the contents of encrypted signaling messages. Whereas a firewall can use a UDP ALG as a workaround for encrypted signaling, a NAT that attempts to pass encrypted signaling has no similar workaround. Because the NAT cannot rewrite the addresses in the protocol message, the signaling protocol breaks. One way around this problem is discussed later, in a scenario in which each endpoint discovers its public mapped address and performs the fixup directly.

NAT/FW Traversal Solutions

NAT/FW traversal refers to the capability of video conferencing endpoints to connect to each other across NATs and firewalls. Because firewalls often include NAT capability, the term firewall traversal also applies to NAT/FW traversal. Solutions for firewall traversal should ideally satisfy several requirements:

  • One requirement of a firewall traversal solution is simplicity. If a traversal solution requires a special firewall configuration, the firewall configuration must be as simple as possible. Ideally, any special firewall configuration should be limited to opening only a single port on the firewall. Complex firewall configurations for firewall traversal are difficult to manage and do not scale.

  • The traversal solution should allow authorized endpoints on the public Internet to make unsolicited calls to endpoints in the private address space inside the network.

  • A NAT/FW traversal scheme should work for symmetric NATs, the most restrictive type of NAT.

  • The solution should work for NAT/FWs configured with short timeouts for NAT bindings and ALG pinhole lifetimes.

  • If endpoints do not encrypt the call control signaling, the firewall should inspect the signaling to provide two features:

    • By inspecting the signaling, the firewall can implement an ALG. The firewall ALG opens pinholes for the media ports.

    • Many enterprises insist that a firewall should be able to inspect any signaling protocols that pass into or out of the organization to apply security policy to the packets. If a firewall cannot inspect data in a packet, the packet is said to contain opaque data. Opaque data includes protocols the firewall does not understand, as well as encrypted data. Every opaque or encrypted data stream that tunnels out through the firewall is a potential security risk because advanced attack tools that infiltrate an enterprise may use encrypted tunnels to transfer information between the public Internet and the internal network.

  • If video conferencing endpoints use encrypted signaling, the firewall cannot inspect the signaling, and the firewall traversal scheme must work in the absence of a protocol-specific ALG.

  • Ideally, firewall traversal should not require modification to existing endpoints. Modifications to endpoints may be acceptable in the long term if vendors widely adopt them as standards.

  • Ideally, the firewall traversal scheme should not require proxy devices to act as signaling or media gateways, because each proxy server adds another hop to the signaling or media path, which in turn adds more end-to-end delay.

  • In addition, proxies located in the DMZ, outside the internal firewall, have less protection from the public Internet. These servers potentially represent a single point of failure. In addition, these servers can present a possible threat. If a hacker takes control of the server, the attacker might have unfettered access to the enterprise, because the internal network trusts the devices in the DMZ.

  • The solution should not require proprietary modifications to call control servers such as SIP proxies or gatekeepers.

The following sections describe several firewall traversal solutions.

VPN

Administrators can easily configure a firewall to allow an IPsec VPN tunnel through it to allow remote teleworkers to connect to a VPN concentrator in the enterprise. A VPN tunnel can also allow branch offices to connect seamlessly to the campus network. Because the VPN infrastructure enforces authentication and authorization of the remote entities, firewall inspection of the traffic is not necessary. VPN basically avoids the firewall traversal problem.

The downside of the VPN solution is that it only provides a solution for teleworkers or remote offices that can authenticate to the VPN subsystem. This means that administrators must explicitly grant authorization to these endpoints. The VPN approach does not allow connections to or from other endpoints in the public Internet.

ISDN Gateway

In the early days of IP video conferencing, the only practical way to allow NAT/FW traversal between enterprises was to circumvent the problem by using H.320 ISDN gateways to connect two endpoints over the public switched telephone network (PSTN). Figure 8-14 shows the topology for interenterprise H.323 connectivity, in which two endpoints connect over the PSTN WAN.

Using ISDN to Circumvent the NAT/FW Traversal Problem

Figure 8-14. Using ISDN to Circumvent the NAT/FW Traversal Problem

The major downside of this approach is the added delay of converting H.323 to H.320 and then back again. The additional delay reduces the usability of the connection by degrading real-time interaction. In addition, the presence of a gateway complicates the dial plan: Users must dial the gateway, then dial an ISDN phone number, then connect to an Interactive Voice Response (IVR), and then dial the extension of the remote endpoint. In addition, some ISDN gateways may not be compatible with all H.323 endpoints or all modes of H.323.

Universal Plug-and-Play

Universal Plug-and-Play (UPnP) is a standard protocol that allows endpoints to communicate with NAT/FW devices. UPnP allows endpoints to do the following:

  • Request the NAT/FW to allocate a public mapped address

  • Determine the public mapped address assigned to an endpoint

  • Request the NAT/FW to open a pinhole for data arriving at the mapped address

When endpoints use UPnP, they use the mapped addresses directly in all protocol messages, instead of allowing the NAT to perform the fixups. Therefore, the administrator must disable ALG fixups on the NAT. However, UPnP has several downsides:

  • The endpoint client must be modified to use UPnP, and the NAT/FW must implement it, which means there exists a chicken-and-egg problem: Only a handful of NAT/FW routers support UPnP; mostly consumer-level devices and only a few endpoints support the protocol. However, Windows XP provides built-in support for UPnP, which means that desktop-based video conferencing endpoints can be easily enhanced to make use of it.

  • UPnP does not solve the situation in which both endpoints are behind the same NAT.

  • The UPnP discovery mechanism is based on multicast IP addresses; in enterprises, multicast packets are often limited to local subnets. Therefore, the protocol is best for small environments, such as a home office, rather than an enterprise.

UPnP presents a possible security risk because a hacker who has infiltrated an enterprise can use UPnP to open many pinholes on a NAT, making the enterprise vulnerable to attack.

IP-IP Gateway Inside the Firewall

Figure 8-15 shows a solution for NAT/FW traversal using an IP-IP media gateway.

NAT/FW Traversal with an IP-IP Gateway Inside the Firewall

Figure 8-15. NAT/FW Traversal with an IP-IP Gateway Inside the Firewall

In this approach, all media streams coming from or going to internal endpoints flow through the gateway. In addition, this topology has two gatekeepers:

  • An internal gatekeeper to facilitate connections between internal endpoints

  • A gatekeeper in the DMZ to allow external endpoints to dial into the network

The IP-IP gateway is analogous to an HTTP proxy; users can configure a web browser to use an HTTP proxy, which acts as a gateway between the internal and external network.

The internal GK and IP-IP gateway must have static IP addresses, and the administrator must configure the NAT to assign static public mapped IP addresses to those devices, with bindings that do not time out.

The firewall must implement an H.323 ALG and snoop the signaling to open pinholes for the media in both directions. In addition, the NAT must implement an ALG to rewrite IP addresses in the protocol headers.

Administrators typically use the following firewall configuration for this topology:

  • The firewall permanently opens pinholes to allow UDP RAS traffic on port 1719 to flow between the internal and external GK.

  • The firewall permanently opens pinholes to allow H.225 traffic on port 1720 to flow between the internal and external GK. This topology generally requires endpoints to use GKRCS so that H.225 signaling does not need to pass between external endpoints and the internal IP-IP gateway.

  • For ease of firewall configuration, the administrator can activate an ALG to facilitate H.245 connection establishment: The firewall ALG opens pinholes to allow H.245 traffic to flow directly between external endpoints and the IP-IP gateway, in case endpoints establish H.245 connections but do not tunnel the connections over H.225.

  • The firewall uses an H.323 ALG to open media pinholes.

H.460

Another solution for NAT/FW traversal is to place an IP-IP gateway outside the firewall. Many Session Border Controllers (SBC) commonly implement this feature. SBCs are available from various vendors and perform additional tasks such as adding quality of service (QoS) or call admission control (CAC).

However, an SBC-centric approach gaining traction in the H.323 video conferencing space is the H.460 standard. The standard consists of three major protocols:

  • H.460.17—. NAT/FW traversal of H.323 signaling

  • H.460.18—. NAT/FW traversal of H.323 signaling

  • H.460.19—. NAT/FW traversal of H.323 media

A NAT/FW traversal solution may use either H.460.17 or H.460.18 for signaling and then use H.460.19 for media.

H.460.17 simplifies the firewall traversal somewhat by allowing all H.323 signaling to occur over a single port, whereas H.460.18 still requires multiple ports.

These protocols allow NAT/FW traversal with no additional NAT or firewall configuration and do not use firewall ALGs or NAT ALGs. In fact, administrators must disable the ALG capabilities of the NAT/FW to use these protocols. In addition, these protocols allow traversal of authenticated signaling and encrypted media. However, the H.460 solution requires that endpoints and gatekeepers implement additional signaling inside the H.323 signaling protocols. If an endpoint does not support the additional signaling, a proxy gateway located in the internal network must implement this signaling for the endpoint.

H.460.17

Figure 8-16 illustrates NAT/FW traversal with H.460.17.

H.460.17

Figure 8-16. H.460.17

The DMZ contains a traversal server (TS) consisting of a modified gatekeeper. The DMZ GK operates only in GKRCS mode. Inside the enterprise, the diagram shows two types of endpoints: those that support H.460.17 natively, and those that rely on a gateway proxy to incorporate the additional H.323 signaling required by the traversal protocol.

The only firewall configuration necessary requires stateful bidirectional pinholes: When a signaling packet flows from inside the firewall to outside the firewall, the firewall must open a pinhole for packets to flow from outside to inside on the same port. Therefore, endpoints must also use symmetric ports. However, the firewall can achieve an additional level of security by allowing outgoing port 1720 traffic to flow only to the TS.

The H.450.17 protocol requires the endpoints, or endpoint proxies, to send keepalive packets out through the firewall at frequent regular intervals on the signaling ports to preserve the NAT binding. By using the keepalive mechanism, internal endpoints maintain a persistent bidirectional link to the TS. The endpoints may generate keepalive packets by sending either lightweight RAS RRQ re-registrations or empty H.225 TPKT data (Transport Protocol Data Unit Packets) containing no-op messages.

H.460.17 requires one significant modification to the H.323 standard: Instead of sending RAS packets to the TS, an internal H.460.17 endpoint first establishes a long-lived H.225 connection to the TS. The endpoint then sends RAS messages within this H.225 connection. This approach is referred to as RAS tunneling. The TS observes the public-mapped address assigned by the NAT for this endpoint and uses this address as the destination for protocol signaling directed back at the internal endpoint.

H.460.17 requires the use of H.245 tunneling, which means that all signaling—RAS, H.225, and H.245—is transmitted over H.225 TCP port 1720.

The primary feature of H.460.17 is the long-lived H.225 connection established between the internal endpoint and the TS, and the keepalive mechanism that preserves the NAT bindings to allow the DMZ TS to complete an unsolicited connection to an internal endpoint.

H.460.18

H.460.18 specifies an alternative method of NAT/FW traversal for H.323 signaling. It uses some of the same mechanisms as H.460.17, including the following:

  • H.460.18 requires additional signaling messages inside H.323. If the endpoints do not implement this modification, gateway proxies inside the enterprise must provide this functionality.

  • It requires internal endpoints to send keepalive messages to the TS to preserve NAT bindings.

  • The TS operates in GKRCS mode only.

  • The NAT/FW must open symmetric bidirectional pinholes for the signaling.

Unlike H.460.17, which sends all signaling over an H.225 connection, H.460.18 allows the NAT to open separate ports for RAS, H.225, and H.245. Figure 8-17 shows the topology.

H.460.18

Figure 8-17. H.460.18

The principal element of H.460.18 is the ability of the TS GK to send a special RAS message to the internal endpoint, which instructs the endpoint to send a packet out the NAT to open a corresponding inbound pinhole. The TS GK uses the H.323 RAS Service Control Indication (SCI) message to communicate this special command to the endpoint. H.323 SCI messages allow either the endpoint or the gatekeeper to invoke new custom-defined services. After an internal endpoint responds to the SCI message and opens a NAT/FW pinhole, the endpoint must keep the NAT binding active by sending frequent periodic keepalive messages out the NAT/FW. The endpoint may use three different keepalive mechanisms, depending on the signaling channel:

  • RAS—. Lightweight RRQ re-registration messages

  • H.225—. Empty TPKT packets

  • H.245—. Empty TPKT packets

The internal endpoint registers with the TS by sending a RAS message. The TS observes the source address of this message to determine the public mapped address for the endpoint. The internal endpoint then maintains the NAT binding for the RAS channel by issuing keepalive messages.

When an external endpoint wants to connect to an internal endpoint, the external endpoint sends a setup message to the TS, and the TS creates a RAS SCI packet that requests that the internal endpoint send an empty H.225 packet out through the NAT to the TS to open a reverse pinhole for the incoming setup message. The RAS SCI message provides the port number on the TS, which is 1720 for H.225 connections, or an ephemeral port for H.245 connections. Upon receiving this RAS SCI packet, the internal endpoint sends a packet to open the reverse pinhole and then sends keepalive packets to preserve the NAT binding. The TS again observes the source address of the packet to determine the public mapped address for the internal endpoint. The TS then forwards the setup message from the external endpoint through the reverse pinhole. If two endpoints attempt to create a direct H.245 connection via H.225 messages, the TS translates the H.245 addresses in the H.225 messages so that both internal and external endpoints terminate their H.245 connections on the TS.

The administrator can add greater security by writing firewall rules to restrict outgoing RAS, H.225, and H.245 messages to flow only to the TS.

H.460.19

H.460.17 and H.460.18 only provide NAT/FW traversal for signaling. Figure 8-18 shows the approach of H.460.19, which provides NAT/FW traversal for media packets that flow between two endpoints located on either side of a NAT/FW.

H.460.19

Figure 8-18. H.460.19

All media packets flow through the media relay in the DMZ, which must be accompanied by a DMZ GK that implements a NAT/FW traversal scheme for H.323 signaling, such as H.460.17 or H.460.18. The GK must also be able to control the operation of the media gateway. In addition:

  • Internal client endpoints that do not support the H.460.19 protocol must use a gateway proxy endpoint inside the network.

  • The administrator must configure the NAT/FW to allow bidirectional symmetric pinholes between internal endpoints and the media relay.

In normal H.323 signaling, endpoints signal the media channels to each other by exchanging H.245 packets containing the destination addresses/ports for the media. In H.460.19, the GK intercepts these packets and modifies the IP addresses/ports to ensure that media flows through the media gateway. When an external endpoint opens a channel to send media to an internal endpoint, the media first flows to the gateway, and then the gateway forwards the packet to the internal endpoint.

However, before the gateway can send media to the internal endpoint on a new RTP port, the internal endpoint must create a reverse pinhole for the media. The GK sends an H.245 message to the internal endpoint, instructing the endpoint to create this reverse pinhole by sending an empty RTP packet outbound to the gateway. The H.245 message contains the port on the gateway from which packets will originate. The endpoint responds by sending empty RTP packets to this port on the gateway, from a source port on the internal endpoint that will be the destination for inbound packets. After the gateway receives the empty RTP packet, it observes the public mapped address of the source and forwards inbound RTP packets to this public mapped address.

The internal endpoint must also use the same mechanisms to open pinholes and maintain bindings for the RTCP packets. The keepalive packet for RTCP is the Sender Report (SR) message.

H.460.19 has one additional feature that endpoints can use to reduce the number of open RTP media ports. This feature is called media multiplexing, and it allows a sender to multiplex data from different RTP sessions onto the same RTP port. This feature requires the sender to add a 4-byte multiplexID value after the UDP packet header and before the RTP packet header. The multiplexID identifies the stream. For each one-way media stream, the receiver chooses the mapping between sessionID and multiplexID, and the receiver transmits this information to the sender in the H.245 signaling messages.

H.460.19 specifies a mandatory antispamming feature that mitigates DoS attacks. To implement antispamming, a sender adds an additional authentication tag to the end of an RTP packet, which authenticates items in the RTP header. The receiver can quickly determine whether the RTP packet is valid by performing a quick authentication operation on these RTP header values. The intent of the antispamming feature is to allow receivers to quickly identify malicious RTP packets without doing extensive processing.

Endpoints may use H.460.19 with encrypted media, and the authentication tag added by antispamming provides DoS protection (in addition to any authentication tags added by the media encryption protocol).

H.460.18 and H.460.19 Issues

Video conferencing vendors are moving to adopt H.460.18 along with H.460.19. This protocol combination has the following attributes:

  • Administrators must configure the NAT/FW to allow any device inside the enterprise to send packets to the GK and media relay servers in the DMZ. In addition, the NAT/FW must open bidirectional symmetric pinholes in response to packets sent out the NAT/FW by internal endpoints. These requirements apply to all high-valued ports ranging from 1024 to 65,535.

  • The firewall may not implement any protocol-level ALG processing or fixups.

  • Different enterprises may implement peering, which is shown in Figure 8-19.

H.460.18/19 Peering

Figure 8-19. H.460.18/19 Peering

In the peering scenario, the administrators of the two enterprises cooperate and configure the DMZ GKs to work with each other. Without peering, an external H.323 endpoint must switch to a new GK when connecting to an endpoint in a different enterprise.

Figure 8-19 also shows a scenario with a teleworker at a remote location behind a NAT/FW. In this case, the enterprise TS also provides NAT traversal for the endpoint in the remote location. The remote endpoint must either be H.460.18/19 enabled, or it must use an H.460.18/19 proxy. Many teleworkers use PC-based desktop video conferencing endpoints, and the H.460.18/19 proxy can be in the form of a software client that runs on the desktop PC.

NAT/FW Traversal Using STUN/TURN/ICE

One means of NAT/FW traversal is to use a Session Border Controller (SBC) topology similar to the H.460 approach, with a server in the DMZ and a method of sending keepalives out through the NAT to maintain the address bindings.

However, an emerging standard for NAT/FW traversal is the method defined by Interactive Connectivity Establishment (ICE). This method is particularly suitable for SIP. ICE in turn uses other protocols, such as Simple Traversal Underneath NATs (STUN) and Traversal Using Relay NAT (TURN). The next sections discuss these protocols.

STUN

STUN is a client/server protocol that internal endpoints use to obtain their external public mapped address. STUN also provides a way for two endpoints to verify that they have connectivity through a NAT. The STUN protocol is still evolving as a standard in the IETF, but this section discusses the fundamental principles used by STUN that facilitate NAT traversal. These protocols are likely to appear in SIP endpoint products to enable enterprise-to-enterprise connections.

STUN introduces the concept of a server that exists on the public Internet to provides a service to endpoints that reside inside a private address space. The client begins by sending a STUN message to the default port on the STUN server.

The server replies by sending a message back to the apparent source address of the client. This message contains the public mapped address of the client. If the public mapped address is different from the private address of the client, the client knows that it is behind a NAT.

To avoid security vulnerabilities, a server that provides STUN functionality must allow the client to establish an authenticated session before exchanging messages.

If a client is behind a NAT that uses address-independent mapping and filtering, the client can use STUN to discover the public mapped address corresponding to one of its internal address/ports. It can then perform its own NAT fixup by using the public address/port combination inside protocol signaling messages. This endpoint-implemented fixup is possible because a NAT that provides address-independent mapping creates the same public mapped address for a single internal endpoint, and then uses that mapped address for all external destination endpoints. Therefore, an internal endpoint can use the public mapped address discovered via STUN as a return address when talking to other endpoints on the public Internet.

However, many NATs in large enterprises are symmetric NATs and create a new public mapped address/port for each external destination endpoint, even when an internal endpoint uses the same source address and port to talk to each of those external endpoints. Therefore, when an internal endpoint is behind a symmetric NAT, the endpoint cannot reuse a public mapped address discovered by STUN to connect with other external endpoints.

The ICE protocol, discussed later, allows endpoints to use a static public mapped address, discovered by STUN, if the client is behind a NAT that uses address-independent mapping and filtering. However, in the more likely case of a symmetric NAT, most clients must use a proxy gateway located in the public address space, discussed next.

TURN

TURN is a protocol under development in the IETF. The protocol is evolving, but this section discusses the fundamental principles of the TURN approach for NAT traversal. The TURN protocol defines a TURN server, which is a media relay located in the public address space that allocates static public addresses to clients behind a NAT. Clients can then perform their own NAT fixup by using this public address when connecting to other SIP endpoints through the TURN server. Figure 8-20 shows this NAT topology.

TURN Server Topology

Figure 8-20. TURN Server Topology

The client starts the sequence of events by sending an allocate message to the TURN server to allocate a static public IP address. The TURN server allocates the address in the public address space and then replies to the client with a message containing the allocated address. The TURN server observes the apparent source address of the client and associates this source address with the allocated public TURN address. The client can now forward packets to the TURN server, and the TURN server sends the packets out this allocated address. The TURN server also relays packets arriving at the allocated address back to the client. The originating endpoint can implement its own NAT fixup by creating protocol messages using the public address provided by the TURN server.

The client initially forwards packets through the TURN server by encapsulating the packet inside a TURN send message. When the TURN server receives a send message, it strips off the encapsulation and forwards the packet to the specified address. The TURN server sends the packet out the allocated address/port associated with the apparent source address of packets from the client.

There is another way to describe the functionality of the TURN server: It acts like an address-restricted NAT. After the client forwards data through the TURN server to a destination endpoint address/port, the TURN server allows packets from that external endpoint address, and any port, to flow back to the client encapsulated in TURN messages.

The client may also set one of the external address/port destinations to be the active destination. After the active destination has been set:

  • The TURN server forwards packets originating from the active address/port directly to the client, without TURN packet encapsulation.

  • When the TURN server receives nonencapsulated packets from the client, the TURN server forwards those messages to the active destination.

The downside of using TURN is that endpoints require modification to use the protocol. In addition, the client must send keepalive messages to maintain the NAT binding that connects the client and the TURN server. Clients can implement a keepalive by resending the TURN allocate request.

Like STUN, the TURN server operates only if the administrator configures the NAT/FW to allow bidirectional symmetric pinholes.

Like other NAT/FW traversal solutions that use an intermediate proxy or gateway, a TURN server imposes a delay in the signaling and media paths.

ICE

ICE is an evolving protocol in the IETF that allows two endpoints to exchange a set of candidate addresses for connectivity. Some of the addresses may be in the local private address space, and others may be in the public mapped address space. For the endpoints to discover the most optimal path, both endpoints must support ICE.

In the ICE protocol, each endpoint gathers a list of possible candidate public IP addresses that could allow an incoming packet to reach the endpoint. Endpoints gather these candidate addresses by locating STUN and TURN servers and then interrogating these servers for public mapped addresses. Endpoints may also use UPnP to obtain a public NAT address. In addition, the endpoint uses a local address as a candidate in case both endpoints are behind the same NAT. The endpoint prioritizes these addresses.

The SIP endpoint that initiates a SIP connection sends a SIP INVITE message containing a list of candidate IP addresses in prioritized order.

When the remote endpoint receives the list in the SIP INVITE, it replies with a list of addresses obtained in a similar manner. Each endpoint proceeds to attempt connectivity to the addresses provided by the other endpoint by sending STUN messages to each address. In this mode, the endpoints themselves must implement STUN server functionality and respond to STUN request messages from the other endpoint. When an endpoint receives a STUN return message, it knows that it has found an IP address that permits connectivity. Each endpoint chooses the highest-ranked address that offers connectivity to the other endpoint. Then the SIP endpoints exchange INVITE messages again, this time using the addresses obtained during the connectivity-testing phase.

The benefit of ICE is that if a public IP address exists, ICE will find it. In addition, if one of the endpoints is behind a NAT that uses endpoint-independent mapping and endpoint-independent filtering, ICE finds this low-latency direct route, instead of using a high-latency TURN relay. In addition, ICE allows endpoints to use local private addresses if each endpoint is behind the same NAT.

Similar to other NAT traversal approaches, the endpoints must issue periodic STUN keepalive messages to each other to preserve the NAT bindings.

ICE is beneficial even if one endpoint is behind a NAT and one endpoint is on the public Internet. If both endpoints implement ICE, the endpoints may find a direct connection through a NAT that has lower latency than a TURN server. Also, the endpoint behind the NAT can use the STUN keepalive messages to maintain the NAT bindings and reverse pinholes.

Encryption Basics

Before undertaking an analysis of encryption for video conferencing, it is necessary to have a fundamental understanding of cryptography.

Symmetric Encryption

Data encryption allows a sender and receiver to ensure the confidentiality of data. Video conferencing algorithms encrypt signaling or media using symmetric encryption schemes, which use a single fixed-length key to both encrypt and decrypt the data. Figure 8-21 shows the operation of symmetric encryption.

Symmetric Encryption

Figure 8-21. Symmetric Encryption

The original, unencrypted data is called the cleartext, and the encrypted data is called the ciphertext. The conferencing industry is moving to adopt the Advanced Encryption Standard (AES) for encryption. AES-128 is considered to be highly secure and uses a 128-bit key. Symmetric encryption algorithms such as AES-128 are generally fast enough for real-time media. To work effectively, the sending and receiving endpoints must use a method of secure key distribution. The most simple, but also most cumbersome, method of key distribution is to use a preshared key, distributed to the endpoints in an out-of-band, secure manner. A password is an example of a rudimentary preshared key. However, preshared key distribution usually does not scale well. A later section, “Media Encryption,” describes other forms of key distribution.

Secure Hashes

Data integrity is the ability of a receiver to guarantee that an attacker has not tampered with data in transit on the network. Data integrity prevents MitM attacks on either signaling or media streams. A sender provides a mechanism for the receiver to verify data integrity by adding a secure hash to the end of the data packet.

A hash is a function that takes any number of bytes as an input and produces a small fixed-length output value. One of the widely adopted hash algorithms is SHA-1, which generates a 128-bit hash output value. Most important, hashes are one-way functions, meaning that it is computationally infeasible to perform the hash in reverse: Given a hash output value, attackers will not be able to assemble a string of input bytes that generate the output hash. Because the hash is a one-way function, it is like a checksum that cannot be spoofed. Another characteristic of a hash is that even the smallest change to the input string of bytes will result in a very different value for the output hash.

A secure hash adds a feature: In addition to an input stream of bytes, the secure hash incorporates a key value. Given the stream of bytes and the key value, the secure hash generates a unique output value, which changes if an attacker makes any change to either the string of bytes or the key. The universal standard method of using a key with any hash function is referred to as hashed message authentication code (HMAC), defined in RFC 2104. Any hash function may be converted into a secure hash using RFC 2104, and the name of the resulting secure hash is created by prepending HMAC to the hash name. The secure hash that uses SHA-1 is HMAC-SHA1.

Endpoints can authenticate a packet by calculating the HMAC value for the packet and then appending this value to the packet. In this case, the input to the HMAC algorithm is all bytes in the packet and a key. A receiver that has the key can recalculate the HMAC value and verify that it matches the HMAC value appended to the packet. If the values differ, an attacker has changed either a value in the packet or the HMAC value. An attacker cannot modify the packet and create a new valid HMAC without knowing the key.

However, when a sender and receiver use an HMAC tag for integrity protection, they must still solve the problem of key distribution, just like the case of symmetric encryption.

Video conference endpoints that send encrypted media generally provide both confidentiality and integrity: Encryption of the media provides confidentiality, and an HMAC tag provides integrity.

Asymmetric Encryption: Public Key Cryptography

Unlike symmetric encryption, where both sender and receiver use the same key, public key encryption uses two keys. In this approach, each endpoint creates a public key and a private key. Each endpoint keeps the private key secret but makes the public key widely available. Public key cryptography can perform two major functions: encryption and integrity protection.

Public Key Encryption

When used for encryption, public key cryptography relies on the fact that data encrypted with the public key can be decrypted only using the private key.

Figure 8-22 shows the process of encryption with public key cryptography.

Public Key Encryption

Figure 8-22. Public Key Encryption

After an endpoint encrypts data with a public key, another endpoint can decrypt the data only with a private key. In this diagram, Bob has a public/private key pair and publishes his public key widely. Alice uses the public key from Bob to encrypt a message and then sends the encrypted message to Bob. Because only Bob possesses the private key, Alice can send the encrypted message to Bob in the clear, knowing that only Bob can decrypt it.

However, asymmetric encryption or decryption has a problem: It is highly CPU-intensive. For this reason, endpoints do not use asymmetric encryption to encrypt media streams directly. Instead, the endpoints typically use public key encryption to securely share symmetric keys. In this approach, each endpoint uses the public key from the other endpoint to exchange encrypted symmetric keys, and then the endpoints use the symmetric keys for symmetric encryption of the media or signaling streams.

Digital Signatures

Endpoints can achieve authentication or integrity by using public key cryptography to encrypt hash values, a process called message signing. Message signing relies on the fact that data decrypted by the public key could have only been encrypted with the private key.

Figure 8-23 shows the process of message signing, which is similar to creating an HMAC value.

Creating a Digital Signature

Figure 8-23. Creating a Digital Signature

The sender calculates the hash of a message, using either MD5 or SHA-1 hashing, and then encrypts the hash using a private key. The resulting encrypted hash is called a digital signature. Any endpoint with the public key of the sender can decrypt the hash and then verify the hash against the contents of the message. Just as with encryption, endpoints must distribute their public key widely to allow other endpoints to perform the secure hash verification.

Certificates

X.509 certificates provide a method for endpoints to present their public keys to other endpoints in the network. The X.509 certificate defines a data structure, shown here:

Certificate

  • Version

  • Serial number

  • Algorithm ID

  • Issuer

  • Validity

    • Not before

    • Not after

  • Subject

  • Subject public key info

    • Public key algorithm

    • Subject public key

  • Issuer unique identifier (optional)

  • Subject unique identifier (optional)

  • Extensions (optional)

  • Certificate signature algorithm

  • Certificate signature

The certificate contains the public key of the endpoint and a list of permissions in the extensions item, which includes an indication of whether the certificate is authorized to use its private key to encrypt data or sign messages. The Subject field of the certificate contains subfields that include the identity of the certificate holder. Endpoints often use the distinguished name (DN) subfield to hold the identity.

When two endpoints want to communicate securely, they can exchange their certificates and then use the public keys in the certificates for the purposes of encryption, message authentication, and identity authentication.

However, for the endpoints to trust the identity (such as the Distinguished Name subfield) in the certificate presented to them, the endpoints use a Public Key Infrastructure (PKI). At the heart of a PKI is a device called a certificate authority (CA). The CA creates certificates and issues a certificate to each endpoint. The CA also has its own certificate, called a CA certificate. The CA validates each new certificate by signing the new certificate with the private key of the CA certificate, a process shown in Figure 8-24. To create the signature, the CA calculates the hash over the certificate, then encrypts the hash, and then inserts the result into the signature field of the certificate.

Certificate Signature Creation

Figure 8-24. Certificate Signature Creation

To complete the PKI, each endpoint must also have a copy of the CA certificate. An endpoint can validate the certificate from another endpoint by confirming the signature, using the public key in the CA certificate. Each certificate holds a pointer to the CA that provided the signature. Figure 8-25 shows the process.

Certificate Signature Verification

Figure 8-25. Certificate Signature Verification

Web browsers use this method to validate certificates presented by websites. When a browser connects to a website, the website presents a certificate, and the certificate specifies which CA certificate provided the signature. The web browser must have a copy of the corresponding CA certificate. The browser uses this CA certificate to recalculate the signature of the certificate from the website. If the calculated signature matches the signature in the presented certificate, the certificate is valid.

When a CA issues a certificate, the CA sets attribute values in the certificate to specify how the certificate may be used. A CA may grant a certificate with one or more capabilities:

  • The ability to provide security for encrypted TCP connections, such as Transport Layer Security (TLS)

  • The ability to sign downloadable firmware

  • The ability to sign other certificates (to operate as a CA)

  • The ability to allow for nonrepudiation by guaranteeing the identity of endpoints that establish connections

A later section in this chapter, “H.323.2,” shows how endpoints can use certificates to provide authentication and nonrepudiation.

Certificate Management

When a PKI infrastructure is in place, it provides an elegant way to exchange certificate-based credentials and key material among a large number of endpoints, because each endpoint only needs the CA certificate to validate certificates from other endpoints. However, a certificate-based PKI requires certificate management on both the CA and the endpoints. In addition, the administrator or the endpoint must perform certificate management both at the time of initial certificate distribution and in an ongoing manner.

CA Certificate Installation

When installing certificates on an endpoint, the first step is for the administrator to obtain the CA certificate and install it in the certificate store of the endpoint. The administrator can transfer this certificate in one of several ways:

  • In the most low-tech method, the administrator can use sneakernet. The administrator logs on to the console of the CA, copies the CA certificate to a Universal Serial Bus (USB) drive, and then walks over to the endpoint and transfers the CA certificate to the endpoint.

  • In the most common method for PC-based endpoints, the administrator can log on to the endpoint and then connect to the CA GUI using a web browser. The CA GUI can display the certificate on a web page, and the administrator can copy and paste the certificate into a file on the endpoint. The administrator then places the file into the certificate store on the endpoint. For this method to be secure, the administrator should follow several guidelines:

    • The GUI exposed by the CA should offer Secure Sockets Layer (SSL) connectivity, which allows the CA server to authenticate to the web browser user.

    • The CA should ask the administrator for a password preconfigured on the CA, which allows the administrator to authenticate to the CA server.

    • The administrator should verify the thumbprint of the endpoint certificate, which is a hash of the certificate contents.

  • The endpoint can use a certificate management protocol to obtain the CA certificate from the CA server. Two such protocols are Simple Certificate Enrollment Protocol (SCEP) and Certificate Management Protocol (CMP). A CA can support one of these protocols to allow endpoints to automatically obtain certificate credentials. SCEP is a popular protocol because it is simple. SCEP allows endpoints to include a password (a preshared key) with the request.

When using the last two methods, the administrator should verify that attackers have not tampered with the CA certificate in transit. The administrator should perform this verification manually using a certificate thumbprint, which contains a hash of the certificate contents. The CA can provide the thumbprint out-of-band, either at the CA console or via e-mail. After the administrator copies over the certificate to the endpoint, the administrator calculates the thumbprint of the certificate using a simple thumbprint generator program and compares the two values.

Requesting an Endpoint Certificate

After installing the CA certificate on the endpoint, the next step is for the administrator to request the CA to issue a unique certificate to the endpoint by creating a certificate request. This process is called enrollment. However, the endpoint needs a public/private key pair before it can create the certificate request. There are several ways to create this key pair:

  • The endpoint can generate the public/private key pair directly and keep the private key stored on the endpoint in a secure manner. Typically, the endpoint stores the private key in an encrypted file on disk, using a password. Applications that use the private key may prompt the user for the password to access the private key.

  • Alternatively, the CA can generate the public/private key pair on behalf of the endpoint. After the CA generates the key pair, the administrator must encrypt the private key with a password, transfer the encrypted private key to the endpoint, and then inform the endpoint what the password is. This method of key generation is secure. However, if possible, it is more desirable for the endpoint to generate the private key directly and for the private key to never leave the endpoint.

  • Another method of obtaining a key pair is to use a special hardware device called a hardware security module (HSM), which is usually in the form of a small USB device. The HSM module consists of a keystore, which is a special-purpose hardware device that contains the public and private key. It also has a processor that performs PKI functions using these keys. The keystore provides its public key, but the keystore is designed to never expose its private key.

After the administrator creates a public/private key pair, the next step is for the endpoint to create a certificate request. The certificate request contains the public key of the endpoint and other attributes for the certificate such as the name of the endpoint, the requested expiration time of the certificate, and the requested capabilities of the certificate, such as TLS encryption. Typically, the endpoint creates this certificate request, in which case the administrator must transfer this request to the CA. Alternatively, the CA can generate this request on behalf of the endpoint. The CA then processes this certificate request, creates a certificate, and signs the certificate using the private CA key. The administrator then transfers this certificate back to the endpoint and installs the certificate in the certificate store of the endpoint. The process of transferring the endpoint certificate from the CA to the endpoint is generally the same as the original process of transferring the CA certificate to the endpoint and uses one of three methods:

  • Sneakernet

  • The web-base GUI provided by a CA

  • SCEP or CMP

Note

The SCEP protocol can accept a password. However, when distributing initial certificate credentials, administrators should verify the certificate thumbprint.

The previous steps reveal a sticking point when deploying a PKI infrastructure: There are no well-developed methods of installing initial credentials on a large number of endpoints in a manner that scales well; all these methods require manual intervention.

Endpoint Authentication

After the endpoint has its own signed certificate and the CA certificate, the endpoint may securely connect to other endpoints using certificate-based credentials. In the simplest case, an endpoint can trust the certificate of a remote entity if the certificate of that remote entity is signed by a CA trusted by the endpoint. In addition, endpoints often implement an authorization scheme by accessing an identifier in the certificate. The usual identifier is the Distinguished Name subfield of the Subject field of the certificate. Enterprises typically create directories that list these identifiers, along with the permissions associated with each identifier. Most commonly, enterprises store these mappings in a directory based on the Lightweight Directory Access Protocol (LDAP). The endpoint can look up an identifier in a corporate LDAP directory to determine the list of permissions authorized for that identifier. Administrators can easily use such an LDAP directory to grant fine-grained permissions for each endpoint.

Certificate Revocation

However, before an entity can trust a certificate from a remote endpoint, the entity must check to see whether the administrator has revoked the certificate of the remote entity. The administrator may revoke a certificate if the private key of the certificate is exposed or if the machine on which the certificate resides is stolen. An endpoint checks the revocation status of a certificate by accessing a Certificate Revocation List (CRL). The CA generates the CRL and authenticates the CRL by signing it with the CA private key. The CA often transfers the CRL to a server that publishes the CRL. This publisher is called a CRL Distribution Point (CDP). Each certificate includes fields that list one or more CDPs that other endpoints may use to download the associated CRL. A CA may publish a CRL using an HTTP server or an LDAP directory.

The CRL has an expiration time, typically on the order of six months, and the CA must push a new CRL to the CDP before the current CRL expires. In addition, the CA may push a new CRL to the CDP at any time. Endpoints should download a CRL on a regular basis and must download a new CRL before the current CRL expires. If the risk associated with using a revoked certificate is high, endpoints should download the CRL more often. The endpoints cache the CRL and may update it based on different policies:

  • For the lowest level of security, an endpoint may decide to cache the CRL and then download a new CRL shortly before the current CRL expires. If the endpoint cannot download a new CRL before the current CRL expires, the endpoint can choose to use the old (stale) CRL until a new CRL is available and continually attempt to download the fresh CRL on a best-effort basis. This level of security may suffice for closed environments in which it is unlikely for certificates or servers to be stolen or compromised.

  • The endpoint can add a level of security by refusing to trust any certificates if the endpoint does not have an unexpired CRL list in the cache. This case presents a potential problem because the CDP becomes a single point of failure. If endpoints cannot access a new CRL after the current one expires, all certificate-based secure communication comes to a screeching halt. To avoid this weakness, administrators must take several precautions:

    • Administrators should install multiple CDPs for redundancy. For instance, each certificate may include links to an HTTP-based CDP and an LDAP-based CDP.

    • Administrators should deploy HTTP-based CDPs that are highly available. This level of robustness is generally easy to achieve by using the same techniques used to deploy highly available web servers. CRL distribution using a web server does not need to provide Secure HTTP (HTTPS), because the CRL is already cryptographically signed by the CA, and no confidentiality is necessary when transferring the CRL to an endpoint. Configuring LDAP deployments for high availability is a more involved process.

  • For an additional level of security, the endpoint can periodically download a new CRL on a more frequent basis. Even though the current CRL cache might have an expiration date far into the future, an administrator may revoke a certificate at any time, which means the CA may add a certificate to the published CRL at any time. By checking the CRL more frequently, endpoints can recognize revoked certificates sooner.

  • For the highest level of security, the endpoint can download the CRL each time it attempts to validate a remote certificate. In this case, downloading an entire CRL may result in a large bandwidth transaction, in which case the endpoint can use the Online Certificate Status Protocol (OCSP). OCSP allows endpoints to query the status of individual certificates in a more efficient manner.

All certificates have expiration dates; if an attacker compromises the private key of a certificate, the attacker can only make use of the certificate until it expires. CAs grant certificates with lifetimes that typically vary from six months to two years. For certificates deployed in high-risk, public-facing networks, operators can configure shorter certificate lifetimes. In addition to verifying the validity of certificates from other endpoints, endpoints need to keep tabs on the expiration date of their own certificates. An endpoint must obtain a new certificate before the old certificate expires. The process of obtaining a new endpoint certificate is called reenrollment. The mechanism for reenrollment is the same as for enrollment, with one difference: The endpoint can use a certificate management protocol such as SCEP to connect over the network to a CA. SCEP allows existing endpoints to connect to a CA in a secure manner, using credentials from a current valid certificate, and obtain a new certificate without operator intervention. After the CA issues a new certificate, the CA usually revokes the old certificate to avoid having two different certificates active for the same endpoint at the same time.

Finally, like all certificates, CA certificates eventually expire. In the time period shortly before the CA certificate expires, the CA creates a new CA certificate, and endpoints must obtain this new CA certificate. Endpoints obtain the new CA certificate using the same mechanisms used to obtain the original CA certificate. When a new CA certificate is active, endpoints must get their own certificates re-signed by this new CA certificate by issuing certificate requests to the CA.

Nonrepudiation

Nonrepudiation provides a means to establish the identity of an endpoint that places a call, usually for billing purposes. If the endpoint establishes identity in a secure way, the endpoint cannot repudiate the act of placing the call.

Video conferencing infrastructure can implement nonrepudiation by requiring endpoints to use certificates for authentication and requiring those certificates to have attributes that allow the certificate to assert identity for nonrepudiation. When obtaining certificates, endpoints must specifically ask the CA to grant nonrepudiation capability for those certificates.

Key Distribution

For two endpoints to use symmetric encryption for media or signaling, the endpoints must agree to use a common key for both encryption and decryption, a process called key distribution or key agreement. As mentioned previously, one method of performing key distribution is to distribute preshared keys out-of-band in a secure manner. However, this method of key distribution does not scale well. Two other methods of key distribution include certificate-based distribution and Diffie-Hellman key exchange, as described in the next sections.

Certificates

An endpoint may send a symmetric key to a remote endpoint in a secure manner by encrypting the key with the public key listed in the certificate of the remote endpoint. This method assumes that endpoints in an enterprise participate in a PKI to obtain certificates from a CA.

Diffie-Hellman

Diffie-Hellman key exchange is a method by which two endpoints can agree on a common shared secret. Both endpoints then use the shared secret directly as a symmetric key, or they can use the shared secret to encrypt symmetric keys. Figure 8-26 shows the Diffie-Hellman mechanism.

Diffie-Hellman Key Exchange

Figure 8-26. Diffie-Hellman Key Exchange

The Diffie-Hellman key exchange has public values and private values. The endpoints first agree on the values of p and g, which are public. Each endpoint then creates a secret private value: Alice creates the secret value a, and Bob creates the secret value b. Each endpoint performs calculations using its private value and the public p and g values to create intermediate values. Then the endpoints exchange these intermediate values. Based on the exchanged values, each endpoint calculates the same common shared secret value. Third-party attackers who snoop the Diffie-Hellman exchange cannot compute the secret value, because only someone with one of the private Diffie-Hellman values a or b can compute the secret value.

The problem with Diffie-Hellman key exchange is that it is susceptible to a MitM attack. In this attack scenario, the MitM performs the Diffie-Hellman key exchange with each endpoint, creating two different Diffie-Hellman secrets. After encrypted data starts to flow in each direction, the MitM can decrypt and then re-encrypt the data, acting as a router between the two endpoints. To use Diffie-Hellman key exchange without the threat of a MitM, endpoints must also use some additional means to authenticate each other. A common way of performing this type of authentication is to use the identity established by certificates.

IPsec and TLS for Secure Signaling

Two common methods to provide security for endpoint signaling are IPsec and TLS.

IPsec

IPsec operates by applying encryption at the IP layer, below the TCP and UDP stack. Because IPsec applies to the lowest layers of the IP stack, endpoints typically implement it as part of the operating system kernel, independently of the upper-layer application. Therefore, the applications are unaware of the underlying security, but the IPsec tunnel protects the UDP and TCP packets. However, administrators and users must manually configure IPsec on the originating and terminating endpoints and distribute IPsec credentials to these endpoints.

These constraints make IPsec ideal for teleworkers with a PC-based video conferencing endpoint at home. By establishing an IPsec VPN connection from a remote site to the enterprise, the teleworker can establish a direct secure connection. At the remote site, the user can use either a software-based VPN on the PC or a hardware-based VPN on the router. The enterprise hosts a VPN concentrator to allow the teleworker to connect.

However, IPsec is impractical for endpoints other than those used by teleworkers to dial into an enterprise remotely. IPsec is generally not practical for endpoint-to-endpoint connections within an enterprise, or between an endpoint in the enterprise and a nonteleworker in the public Internet, because administrators need to manually configure the VPN credentials and IP addresses of both endpoints. In addition, only some NATs offer a pass-through or tunnel mode that allows IPsec to traverse the NAT using the NAT-T standard.

TLS

TLS is an application layer protocol, because it requires applications on the two endpoints to establish the TLS connection. Unlike IPsec, which is usually hidden in the kernel of the operating system, the endpoint application must generally support TLS to use it. One exception is Stunnel, an application that provides a TLS wrapper that transparently protects network connections created by non-TLS-aware applications.

Endpoints most often use TLS in a client/server paradigm, where the server presents a certificate to the client to establish server-side authentication. However, TLS also provides a mechanism for mutual authentication, in which both sides of the conversation exchange certificates. TLS imposes one additional restriction: It requires a TCP connection, which means that UDP-based messages, such as RAS messages, cannot make use of TLS. The IETF is working to develop secure solutions for UDP, and one of those efforts is DTLS (TLS over UDP).

H.323 endpoints can tunnel H.225 and H.245 TCP connections over TLS, but there is no widely adopted method for endpoints to negotiate TLS protection.

On the other hand, SIP provides a way of supporting TLS. Normally, SIP addresses consist of a URL that begins with the characters sip:. One example of such an address is sip:[email protected]. The SIP specification defines a sips: URL format for the destination address of TLS-protected connections. An example of a SIP address used to invoke TSL-protected SIP is sips:[email protected].

Media Encryption

Video endpoints encrypt RTP media in one of two ways:

  • Secure RTP (SRTP)—. SIP endpoints and SCCP endpoints use SRTP exclusively for media encryption. H.323 endpoints may also use SRTP, but H.323 does not provide a well-defined way of establishing SRTP, and the procedures are generally not interoperable between different vendors.

  • H.235.6—. H.235.6 is an encryption standard for H.323 endpoints, as discussed in the section “H.235.6.”

In both cases, two endpoints must exchange a symmetric key and then use that key to encrypt and decrypt the data. For both SRTP and H.235.6, only the media portion of the RTP packet gets encrypted; the RTP header remains unencrypted.

In addition to encrypting the RTP media, SRTP also adds a 4-byte value to the end of the RTP packet to provide an HMAC authentication code. This HMAC code authenticates the RTP header and the RTP payload.

In practice, the complication of encrypting media is not the actual encryption process itself, but rather the mechanism of key exchange. H.235.6 specifies a built-in mechanism for key exchange. For SRTP, endpoints may use several mechanisms to perform key exchange; two examples are security-descriptions and Multimedia Internet Keying (MIKEY).

security-descriptions

SIP endpoints negotiate capabilities, media formats, and network ports by using the Session Description Protocol (SDP), defined in RFC 4566. SDP specifies the syntax for a text-based description of a session, and SIP messages include this SDP information. The SDP security descriptions specification RFC 4568 extends the SDP protocol by specifying how endpoints can include key material inside the SDP section of a SIP message. The SDP security descriptions specification is commonly referred to as security-descriptions or s-descriptions. The endpoint does not encrypt the key information or the SDP section of the message, which means that the endpoints must use encryption to secure the SIP messages. For this purpose, endpoints generally use secure SIP with TLS. By relying on encryption to protect SIP messages, s-descriptions provide a simple method of key exchange.

MIKEY

Another key exchange method is Multimedia Internet Keying (MIKEY). The base MIKEY specification is defined in RFC 3830, and the method that describes using it with SDP information is RFC 4567. Like s-descriptions, MIKEY inserts the key material as a parameter entry inside the SDP section of the SIP message. However, unlike s-descriptions, MIKEY encrypts this SDP entry. One of the benefits of MIKEY is that the SDP information, and therefore the SIP messaging, can transit in the clear, without an encrypted tunnel, while keeping the key material confidential. The downside of MIKEY is that it specifies a rather complex procedure for protecting the key material. MIKEY has four modes of operation:

  • Preshared key—. In this mode, both endpoints use preshared keys to protect the key material. However, preshared key distribution does not scale well.

  • Signed public key using certificates—. Each endpoint must obtain the certificate from the other endpoint before initiating the call. However, if both endpoints have certificates, a more straightforward approach is to use mutually authenticated TLS, which protects the entire SIP message.

  • Signed Diffie-Hellman—. Endpoints exchange Diffie-Hellman parameters to derive a common secret, which the endpoints use to derive the final key material. However, this mode also requires a certificate-based mechanism to authenticate the Diffie-Hellman parameters and prevent a MitM attack.

  • Null—. Endpoints send keys in the clear. Endpoints can use this mode if the SIP messages are encrypted.

H.323 Encryption: H.235

H.235 is part of H.323v4 and is the emerging standard for authenticating signaling and encrypting media for H.323 endpoints. H.235 messages expand upon H.323 signaling by defining crypto-tokens, which are data structures containing cryptographic information. H.323 signaling messages may contain one or more cryptotokens. H.235 was originally a single specification that featured three significant annexes:

  • Annex D—. Baseline security profile. It provides authentication for signaling, and encryption for media, based on preshared keys.

  • Annex E—. Signature security profile. It provides authentication for signaling based on certificates.

  • Annex F—. Hybrid security profile. A combination of annex D and E. Certificates establish initial authentication/identity, and then Diffie-Hellman-derived keys provide symmetric encryption.

In later versions of H.235, the standards committee broke the annexes into separate standards. Commercially available video endpoints use the following H.235 standards:

  • H.235.1—. Baseline security profile (previously part of annex D).

  • H.235.2—. Signature security profile (previously annex E).

  • H.235.3—. Hybrid security profile (previously annex F).

  • H.235.6—. Media encryption (previously part of annex D).

H.235 provides several cryptographic security features:

  • Confidentiality

  • Authentication

  • Integrity

  • Nonrepudiation

In addition, H.235 has modes of operation that can work with NATs that rewrite IP addresses in signaling messages.

H.235.1

H.235.1 is the baseline security profile for H.323. It uses preshared keys to provide integrity protection and authentication for H.323 signaling, using the HMAC-SHA1-96 secure hash, which is a 96-bit HMAC algorithm. In addition, H.235.1 allows endpoints to exchange Diffie-Hellman parameters in the H.225 setup and connect messages. The endpoints use the resulting Diffie-Hellman secret for media encryption, described in H.235.6. However, H.235.1 does not provide any type of confidentiality or encryption for H.323 signaling.

In practice, endpoints may use passwords for preshared secrets. In this case, endpoints add a level of security by performing a simple hash on the password; this hash becomes the preshared secret.

H.235.1 requires the use of GKRCS and provides protection for RAS messages, H.225 messages, and H.245 messages tunneled over H.225. H.235.1 does not provide protection for directly routed H.245 messages. Endpoints generally use the same preshared key for protecting RAS and H.225 messages.

H.235.1 makes use of the H.323 cryptotoken data structure to facilitate authentication and integrity protection. The cryptotoken has the following fields:

  • SenderID, which is the identifier of the sender

  • ReceiverID, which is the identifier of the receiver

  • A time stamp and a random value, both of which prevent replay attacks

  • An HMAC, which is generated with the preshared key

H.235.1 does not provide a means of end-to-end authentication: The authentication is strictly hop by hop. At each hop, a device verifies the authentication and then re-creates authentication tags for the next hop. In this hop-by-hop scenario, all devices in the end-to-end path must trust each other. For each hop-to-hop link, H.235.1 may apply protection in both directions or just one direction. When endpoints authenticate signaling in one direction, this scenario is called single-sided authentication. Signaling authenticated in both directions is referred to as mutually authenticated.

H.235.1 avoids replay attacks by including a time stamp in the message. In case the value of the time stamp is the same for two sequential messages, the H.235.1 message also includes a random value, which differs for each message. The secure hash always includes the time stamp and the random value.

In addition to providing authentication and identity, H.235.1 allows endpoints to transmit Diffie-Hellman parameters within H.225 setup and connect messages. The two endpoints use the Diffie-Hellman values to derive a common secret for use in media encryption, defined in H.235.6. Because the endpoints send the Diffie-Hellman parameters end to end, each hop in the end-to-end path must leave the Diffie-Hellman values untouched.

H.235.1 defines two procedures, each of which uses different structures in the cryptotokens:

  • Procedure I: authentication and integrity—. In this mode, the sender applies the HMAC hash to the entire signaling message, including any IP addresses included in the message. As a result, the hash becomes invalid if the message passes through an ALG NAT that rewrites addresses in the signaling protocol.

  • Procedure IA: authentication only—. In this mode, the hash protects a small subset of elements in the protocol message, including the following:

    • The endpoint identifiers

    • The time stamp and random number

    • The Diffie-Hellman values

    However, the hash does not protect the IP address in the message. Using this mode, the hash remains valid even if a NAT ALG rewrites the addresses in the signaling protocol. However, the message elements not protected by the hash will have no integrity protection.

For both procedures, a firewall cannot inspect the signaling and open pinholes for media ports.

With H.235.1, each pair of communicating devices must have a preshared key, which means that a central administrator must issue keys to each endpoint. To allow H.235.1 protection for calls placed between enterprises, the administrators of different enterprises must collaborate to distribute keys to all the gatekeepers so that two gatekeepers in different administrative domains have preshared keys. Because of this requirement, H.235.1 does not scale well.

H.235.1 protects against the following security threats:

  • DoS—. H.323 entities can use H.235.1 to authenticate signaling messages and thus avoid servicing bogus H.323 connection requests, which would deplete resources.

  • MitM attacks—. Assuming that all hops in the end-to-end signaling path are trusted, a potential MitM may take the form of a compromised router. Because H.235.1 signaling has no confidentiality protection, a MitM can read the packet contents. However, integrity protection (Procedure I) prevents a MitM from modifying protocol message data. In addition, authentication (Procedure I or IA) prevents a MitM from spoofing the identity of a sender.

  • Replay attacks—. The time stamp and random value prevent replay attacks.

  • Spoofing—. Authentication prevents identity spoofing.

  • Connection hijacking—. Authentication prevents connection hijacking.

H.235.2

H.235.2 is a protocol that uses certificates to provide authentication and integrity for H.323 signaling. In addition, H.235.2 can provide nonrepudiation.

When used within a single administrative domain, a certificate-based PKI provides a much more scalable way of distributing credentials than using preshared keys. H.235.2 does not specify how certificates should be distributed or how endpoints should validate certificates.

H.235.2 allows endpoints to create a digital signature for a packet by performing a hash on the data and then encrypting the data with the private key of the certificate. The endpoint may use either MD5 or SHA-1 hashing.

Each certificate contains a field that has an identifier that names the endpoint. This name can take the form of either an H.323 alias or a username. A gatekeeper can use the gatekeeper ID. Devices should not use an IP address as an identifier because a NAT may rewrite an IP address in the signaling header, causing a mismatch between the apparent source of the packets and the identifier.

Like H.235.1, H.235.2 has several attributes:

  • H.235.2 requires the use of GKRCS and provides protection for RAS messages, H.225 messages, and H.245 messages tunneled over H.225. H.235.2 does not provide protection for directly routed H.245 messages.

  • Endpoints can use single-sided authentication or mutual authentication.

  • H.235.2 allows endpoints to exchange Diffie-Hellman parameters in the H.225 setup and connect messages for use with media encryption, described in H.235.6. The authentication mechanism of H.235.2 prevents a MitM attack on the Diffie-Hellman exchange. However, H.235.2 does not provide any type of confidentiality for the signaling.

  • H.235.2 avoids replay attacks by including a time stamp in the message; the digital signature covers this time stamp. In case the value of the time stamp is the same for two sequential messages, the H.235.2 message also includes a random value, which differs for each message.

  • Because the signaling is not encrypted, a firewall cannot inspect the signaling and open pinholes for media ports.

H.235.2-enabled endpoints use their certificates to sign all or part of H.323 signaling messages. Each endpoint must transmit its certificate in the first message that makes use of H.235.2, but there is no need to send the certificate in subsequent messages.

The cryptotoken has the following fields:

  • SenderID, which is the identifier of the sender

  • ReceiverID, which is the identifier of the receiver

  • A time stamp and a random value, both of which prevent replay attacks

  • A digital signature

  • A certificate

An endpoint creates the digital signature by using the private key associated with a certificate to encrypt the hash. The remote endpoint verifies the signature by using the public key in the certificate to decrypt the hash.

H.235.2 defines two procedures to create cryptotokens:

  • Procedure II: authentication + integrity, hop by hop—. In this mode, each hop in the network removes the cryptotoken and creates a new cryptotoken, containing a new certificate-based digital signature. This mode has two submodes:

    • Mode A: The endpoint uses the certificate to create a signature that covers the entire signaling protocol message.

    • Mode B: The endpoint uses the certificate to create a signature that covers a subset of the signaling protocol message. This subset includes the time stamp, random value, senderID, receiverID, Diffie-Hellman parameters, and the certificate itself. Messages using this mode of authentication can pass through a NAT that rewrites IP addresses in signaling messages.

  • Procedure III: end-to-end authentication—. In this case, the cryptotoken travels end to end, and intermediate hops do not modify or remove the token. It also has two modes of operation:

    • Mode A: The endpoint uses the certificate to create a signature that covers the entire signaling protocol message. This mode provides authentication and integrity only if intervening hops do not change any part of the signaling message.

    • Mode B: Authentication covers only a subset of the message. The endpoint uses the certificate to create a signature that covers a subset of the signaling protocol message. The subset includes the time stamp, random value, senderID, receiverID, Diffie-Hellman parameters, and the certificate itself. This mode of authentication can pass through a NAT that rewrites IP addresses in signaling messages.

An endpoint may include multiple cryptotokens in the H.323 signaling message, and the message may contain both hop-by-hop tokens and end-to-end tokens. Each hop must replace the hop-by-hop tokens with newly generated tokens but leave the end-to-end tokens untouched.

H.235.2 provides protection against the same threats listed for H.235.1 (DoS, MitM attacks, replay attacks, spoofing, and connection hijacking). In addition, endpoints can use H.235.2 to provide nonrepudiation as long as two conditions are met:

  • The original message must use Procedure III and include a cryptotoken that allows end-to-end authentication.

  • The endpoint must possess a certificate with authority to assert nonrepudiation. The CA that issues the certificate must grant this authority by setting the appropriate attributes in the certificate.

H.235.2 also provides a means for participants in a multiparty video conference to obtain the certificates of other endpoints in the conference. Typically, multiparty conferences are hosted on a multipoint control unit (MCU). In an MCU-hosted video conference, the endpoints can use H.235.2 to request the certificates of other endpoints from the MCU to create an authenticated list of participants.

H.235.3

H.235.3 is a hybrid security profile that combines the certificate method of H.235.2 with symmetric keys of H.235.1. This profile uses certificates to establish authentication for the initial connection, as defined in H.235.2. Endpoints then exchange Diffie-Hellman info and use the Diffie-Hellman secret as the key for generating HMAC authentication tags in subsequent messages, as defined in H.235.1. This scheme benefits from the scalability of certificate-based PKI to establish identity and authenticated Diffie-Hellman parameters, which avoids the need for preshared keys.

H.235.3 deviates in one aspect from H.235.2: H.235.3 specifically disallows MD5 hashing, which reflects the fact that MD5 is considered a weaker algorithm than SHA-1.

H.235.3 defines one procedure:

  • Procedure IV—. Endpoints use Procedure II of H.235.2 to exchange certificates for the first message. This message includes Diffie-Hellman parameters that each side must use to derive a secret link key. Subsequent messages use the link key with Procedure I of H.235.1. Endpoints may also exchange additional Diffie-Hellman parameter sets in the setup and connect messages to establish keys for media encryption, as described in H.235.6.

Either endpoint may update the link key by sending new Diffie-Hellman parameters. H.235.3 dictates that endpoints must authenticate messages with new Diffie-Hellman parameters using certificates, as defined in Procedure II, instead of using the current link key.

H.235.6

Whereas most SIP endpoints use SRTP to encrypt media, most interoperable H.323 implementations use H.235.6 for media encryption. Like SRTP, H.235.6 uses a session key to encrypt the payload section of an RTP packet. However, unlike SRTP, H.235.6 does not authenticate the entire RTP packet.

H.235.6 defines the voice encryption profile for H.235 to encrypt voice or video media. H.235.6 allows several encryption algorithms: AES, RC2, DES, and Triple DES. However, the most secure of these is AES-128, the only recommended algorithm. To support H.235.6, endpoints exchange Diffie-Hellman parameters during the setup and connect messages as part of H.235.1, H.235.2, or H.235.3. The endpoints derive a Diffie-Hellman shared secret from these parameters, which the endpoints use as a master key. Endpoints typically do not use this master key directly to encrypt media. Instead, endpoints use the master key to encrypt and exchange a session key and then use this session key to encrypt media. Endpoints should exchange a new encrypted session key periodically to reduce the possibility that an attacker can use a brute-force method to discover the session key.

Note

H.235.6 endpoints encrypt the RTP payload data only. Endpoints do not encrypt the RTP headers.

When two endpoints connect, the H.323 protocol specifies that one of the endpoints will become the master. When an endpoint connects to an MCU, the MCU is always the master. H.235.6 specifies that after connection, the master endpoint creates a session key and encrypts it with the Diffie-Hellman master key. The endpoint then sends the encrypted session key to the other endpoint inside an H.245 OpenLogicalChannel message. The master endpoint may reissue a new session key at any time, and the slave may request a new session key from the master at any time.

When an endpoint disconnects from an MCU conference, the MCU should issue new session keys to the remaining endpoints to prevent the disconnected endpoint from listening in on the remainder of the conversation.

H.235.6 has gone through several version iterations. Starting with Version 3 of H.235.6, the specification now permits the use of a salt value for the encryption algorithm. A salt value provides both sides with an initial starting point for the encryption procedure, which prevents precomputation attacks on the media. As a result, H.235.6v3 is considered more secure than earlier versions of H.235.6.

Much like H.460.19, H.235.6 adds a mandatory antispamming authentication tag to the media packets, which mitigates DoS attacks. To implement antispamming, the sender adds an additional authentication tag to the end of an RTP packet, which authenticates items in the RTP header. The intent of the antispamming feature is to allow receivers to quickly identify malicious RTP packets without doing extensive processing. H.235.6 antispamming specifies an HMAC-SHA1 hash, which covers the RTP time stamp and RTP sequence number in the RTP packet header. Endpoints use the current RTP session key to generate and verify the HMAC.

H.235.6 provides protection against several media-related threats:

  • Antispamming prevents DoS and replay attacks.

  • Encryption prevents MitM attacks.

  • Encryption prevents confidentiality attacks.

SIP Encryption

The SIP standard defines a method of establishing a secure SIP signaling connection by using TLS on port 5061. In this case, endpoints use a sips: URL rather than the usual sip: URL. TLS offers either single-sided authentication or mutual authentication, and it provides encryption and integrity for data flow in both directions. The downside of TLS is that it is hop by hop: For the end-to-end connection to be secure, devices at all hops in the end-to-end path must trust each other. An example of a hop is a connection between an endpoint and a transcoder.

SIP may also make use of an end-to-end encryption scheme called Secure/Multipurpose Internet Mail Extensions (S/MIME). S/MIME encrypts SIP signaling end to end using a PKI and requires both sides in the conversation to use certificate-based encryption.

SIP signaling messages may specify Secure RTP (SRTP) for media encryption.

SIP-Digest

SIP-Digest is a password-based mechanism that allows SIP endpoints to authenticate to SIP proxies or SIP servers. In a SIP-Digest exchange, the endpoint always authenticates to the server. Optionally, the server may authenticate to the client. SIP-Digest also supports optional integrity protection, but few endpoints use this capability. SIP-Digest does not provide any sort of confidentiality protection via encryption.

SIP-Digest is almost identical to HTTP-Digest, which is a password-based protocol used to grant users access to websites. When a user accesses a password-protected directory on a web server that is protected with HTTP-Digest, the web server challenges the web browser, and in turn, the web browser pops up a small window that asks the user for credentials. The window displays a text string showing the name of the protected resource. This name is called the realm. The window typically has entry boxes for a username and password that the user must enter to gain access to resources associated with that realm. When the user enters the correct username and password, the browser automatically supplies the same username and password for all further HTTP messages that request access to directories under the same realm.

The operation of SIP-Digest is basically the same: When a SIP endpoint attempts to connect to a SIP server protected by a realm, the SIP server challenges the endpoint for a username and password associated with that realm, and the end user supplies these credentials. The username and password comprise a preshared secret known to both the client and the server.

Figure 8-27 shows the challenge-response call flow of SIP-Digest.

SIP-Digest

Figure 8-27. SIP-Digest

The SIP client issues an INVITE to the server, attempting to connect to a protected resource. The server rejects this initial request and issues a challenge to the client. In the case of a SIP server, this message is an HTTP WWW-Authenticate message, along with an HTTP 401 error message. The following shows some of the information contained in the challenge message:

realm="bigdatabase.com",
nonce="9dfe919a99345037d9f9b8c999263d9ef9"
...

The message contains several parameters, and included in this parameter list are the name of the realm and a nonce value. The nonce is a randomly generated value that the client includes in a secure hash calculation.

The client responds by resending the SIP invite message, this time inside an HTTP Authorization message, containing the client credentials. The following shows an example of part of the response message:

username="bob",
realm="bigdatabase.com",
nonce-count="00000001"
response="6629fae49393a05397450978507c4ef1",
...

In addition to the username and the realm, the message includes a response value and a nonce count. The client creates the response value by applying a hash to a series of values. Included with these values are the shared password, the nonce from the server, and other values from the SIP-Digest protocol. A nonce count is a value that counts how many times the client has used the currently active nonce.

When the server receives this response, it recalculates the secure hash using the preshared password, the nonce, and other values. If the calculated value matches the response entry, the server has authenticated the client.

The server nonce value is required to prevent precomputation dictionary attacks. Without the nonce value, an attacker can prepare for an attack by sequencing through a dictionary of likely passwords and calculating the hash value corresponding to each password. Over time, the attacker can create a large table of hash/password pairs. Armed with this table, the attacker can then snoop the signaling, extract the hash, and attempt to look up this hash value in the table, revealing the password. Because the server randomly selects the nonce at the time of the connection, however, the attacker cannot know this value in advance and has no time to precompute the table. To provide a greater level of security, the server may use the same nonce for several transactions and then change the nonce to minimize the time that any one nonce is in effect. The server also uses the nonce count as a sequence number to prevent replay attacks.

Note

The server sends the nonce to the client in the clear. The nonce does not have to be secret; it only has to be unknown beforehand.

Even though an attacker cannot make use of a precomputed dictionary, the attacker can still snoop the signaling and then attempt to derive the password using an offline dictionary attack that incorporates the observed nonce. If the attacker can derive the password in this manner before the client and server change the password, the attacker can access all resources protected by the realm. This weakness is one of the downsides of SIP-Digest. A way to thwart this attack is to enforce strong passwords that are unlikely to be found in password dictionaries.

One scenario in which the server nonce fails to prevent a precomputation attack arises if an attacker can operate as a MitM by spoofing the server to the client. In this case, the client unwittingly performs a SIP-Digest exchange with the MitM, and the MitM returns a bogus challenge that contains a nonce previously used to create a precomputed table of password/hash values. When the MitM receives the response from the client, the MitM can then make use of the precomputation attack.

To thwart this attack, an optional mode of SIP-Digest allows the client to send a response that includes yet another nonce, called the client nonce or cnonce. The client calculates the hash as before but also includes the cnonce as one of the inputs to the hash. In addition, the client adds the cnonce as one of the parameters included in the message. The MitM cannot know the value of the cnonce in advance and has no time to precompute a password/hash table. Of course, the attacker can still use an offline dictionary attack after the exchange.

After the challenge/response, SIP-Digest allows for a third exchange, consisting of an HTTP AuthenticationInfo message from the server to the client, to allow the server to acknowledge the receipt of the client response. Similar to the response from the client, this message contains a hash that includes a series of values, among them the password, the nonce, and the cnonce. By including the password in the hash, the server proves that it knows the password and therefore authenticates to the client. In addition, the server can include a new nonce value that will be active for future handshakes; this value is referred to as the nextnonce. The following shows some of the values in this message:

The response-auth entry is the hash from the server. The server also includes the value of the cnonce.

nextnonce="49d28ef84022ab38153859d28ef8402102",
response-auth="6629fae49393a05397450978507c4ef1",
cnonce="0a4f113b"
...

SIP-Digest optionally provides integrity protection of SIP messages. In this mode, the input to the hash function includes the contents of the HTTP entity-body, which is the actual payload that includes the SIP message. This integrity protection is available for both the Authentication message response from the client and the AuthenticationInfo message from the server.

However, system administrators who use SIP-Digest must enforce strong passwords to thwart offline dictionary attacks.

One benefit of SIP-Digest is that the server and client need not store the password in the clear. Instead, both sides can store a hash of the username, realm, and password and then use this hashed value along with any values for the nonce and cnonce.

SCCP Encryption

The Cisco SCCP VoIP scheme is similar to SIP in its use of secure protocols. SCCP specifies the use of TLS for signaling encryption over port 2443. This use of TLS is similar to the secure SIP protocol. The CallManager distributes key material over this encrypted link, similar to the SIP methodology of using s-descriptions to send keying material in an SDP message. SCCP uses SRTP for media encryption in a way that is identical to secure SIP endpoints.

Summary

This chapter shows that security is a complex topic and that it requires protection at several layers of the network: Layer 2, Layer 3, and the stateful session layer. In addition, the security methods vary depending on the protocol: SIP, H.323, or SCCP. The challenge is to deploy secure protection of voice and video, while at the same time using techniques that allow the voice and video protocols to work in the presence of NATs and firewalls. One area where video conferencing will see significant progress is interoperability. As SIP endpoints adopt STUN/TURN/ICE, and as H.323 endpoints adopt H.460, connections between endpoints in the enterprise and endpoints in the public Internet will get easier. As SIP endpoints adopt TLS and SRTP, and as H.323 endpoints adopt H.235, more video calls will be encrypted. With this additional level of interoperability, video conferencing has the potential for accelerated future growth.

References

H.235.1: H.323 security: Baseline security profile. ITU-T Recommendation H.235.1. September 2005.

H.235.2: H.323 security: Signature security profile. ITU-T Recommendation H.235.2. September 2005.

H.235.3: H.323 security: Hybrid security profile. ITU-T Recommendation H.235.3. September 2005.

H.235.6: H.323 security: Voice encryption profile with native H.235/H.245 key management. ITU-T Recommendation H.235.6. September 2005.

H.323v5: Packet-based multimedia communications systems. ITU-T Recommendation H.323. July 2003.

H.460.17: Using H.225.0 call signaling connection as transport for H.323 RAS messages. ITU-T Recommendation H.460.17. September 2005.

H.460.18: Traversal of H.323 signaling across network address translators and firewalls. ITU-T Recommendation H.460.18. September 2005.

H.460.19: Traversal of H.323 media across network address translators and firewalls. ITU-T Recommendation H.460.19. September 2005.

UPnP Forum: http://www.upnp.org/

Dierks, T., and C. Allen. The TLS Protocol Version 1.0. IETF RFC 2246. 1999.

Franks, J., P. Hallam-Baker, J. Hostetler, S. Lawrence, P. Leach, A. Luotonen, and L. Stewart. HTTP Authentication: Basic and Digest Access Authentication. RFC 2617. 1999.

Krawczyk, H., M. Bellare, and R. Canetti. HMAC: Keyed-Hashing for Message Authentication. IETF RFC 2104. 1997.

Postel, Jon, ed. Transmission Control Protocol. IETF RFC 793. 1981.

Rescorla, E. Diffie-Hellman Key Agreement Method. IETF RFC 2631. 1999.

Rosenberg, J., H. Schulzrinne, G. Camarillo, A. Johnston, J. Peterson, R. Sparks, M. Handley, and E. Schooler. SIP: Session Initiation Protocol. IETF RFC 3261. 2002.

Santesson, S., and R. Housley. Internet X.509 Public Key Infrastructure Authority Information Access Certificate Revocation List (CRL) Extension. IETF RFC 4325. 2005.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.206.25