After the exposure of certain secret operations carried out by the National Security Agency (NSA) of the United States, by its former contractor, Edward Snowden, most of the governments, corporations, and even individuals started to think more about security. Edward Snowden is a traitor for some while a whistle-blower for others. The Washington Post newspaper published details from a document revealed by Edward Snowden on October 30, 2013. This was a disturbing news for two Silicon Valley tech giants, Google and Yahoo!. This highly confidential document revealed how NSA intercepted communication links between data centers of Google and Yahoo! to carry out a massive surveillance on its hundreds of millions of users. Further, according to the document, NSA sends millions of records every day from the Yahoo! and Google internal networks to data warehouses at the agency’s headquarters in Fort Meade, Md. After that, field collectors process these data records to extract out metadata, which indicate who sent or received emails and when, as well as the content such as text, audio, and video.1
How is this possible? How come an intruder (in this case, it’s the government) intercepts the communication channels between two data centers and gets access to the data? Even though Google used a secured communication channel from the user’s browser to the Google front-end servers, from there onward, and between the data centers, the communication was in cleartext. As a response to this incident, Google started securing all its communication links between data centers with encryption. Transport Layer Security (TLS) plays a major role in securing data transferred over communication links. In fact, Google is one of the first out of all tech giants to realize the value of TLS. Google made TLS the default setting in Gmail in January 2010 to secure all Gmail communications and four months later introduced an encrypted search service located at https://encrypted.google.com. In October 2011, Google further enhanced its encrypted search and made google.com available on HTTPS, and all Google search queries and the result pages were delivered over HTTPS. HTTPS is in fact the HTTP over TLS.
In addition to establishing a protected communication channel between the client and the server, TLS also allows both the parties to identify each other. In the most popular form of TLS, which everyone knows and uses in day-to-day life on the Internet, only the server authenticates to the client—this is also known as one-way TLS. In other words, the client can identify exactly the server it communicates with. This is done by observing and matching the server’s certificate with the server URL, which the user hits on the browser. As we proceed in this appendix, we will further discuss how exactly this is done in detail. In contrast to one-way TLS, mutual authentication identifies both the parties—the client and the server. The client knows exactly the server it communicates with, and the server knows who the client is.
The Evolution of Transport Layer Security (TLS)
TLS has its roots in SSL (Secure Sockets Layer). Netscape Communications (then Mosaic Communications) introduced SSL in 1994 to build a secured channel between the Netscape browser and the web server it connects to. This was an important need at that time, just prior to the dot-com bubble.2 The SSL 1.0 specification was never released to the public, because it was heavily criticized for the weak cryptographic algorithms that were used. In February 1995, Netscape released the SSL 2.0 specification with many improvements.3 Most of its design was done by Kipp Hickman, with much less participation from the public community. Even though it had its own vulnerabilities, it earned the trust and respect of the public as a strong protocol. The very first deployment of SSL 2.0 was in Netscape Navigator 1.1. In late 1995, Ian Goldberg and David Wagner discovered a vulnerability in the random number generation logic in SSL 2.0.4 Mostly due to US export regulations, Netscape had to weaken its encryption scheme to use 40-bit long keys. This limited all possible key combinations to a million million, which were tried by a set of researchers in 30 hours with many spare CPU cycles; they were able to recover the encrypted data.
SSL 2.0 was completely under the control of Netscape and was developed with no or minimal inputs from others. This encouraged many other vendors including Microsoft to come up with their own security implementations. As a result, Microsoft developed its own variant of SSL in 1995, called Private Communication Technology (PCT).5 PCT fixed many security vulnerabilities uncovered in SSL 2.0 and simplified the SSL handshake with fewer round trips required in establishing a connection. Among the differences between SSL 2.0 and PCT, the non-encrypted operational mode introduced in PCT was quite prominent. With non-encrypted operational mode, PCT only provides authentication—no data encryption. As discussed before, due to the US export regulation laws, SSL 2.0 had to use weak cryptographic keys for encryption. Even though the regulations did not mandate to use weak cryptographic keys for authentication, SSL 2.0 used the same weak cryptographic keys used for encryption, also for authentication. PCT fixed this limitation in SSL 2.0 by introducing a separate strong key for authentication.
Netscape released SSL 3.0 in 1996 having Paul Kocher as the key architect. This was after an attempt to introduce SSL 2.1 as a fix for the SSL 2.0. But it never went pass the draft stage, and Netscape decided it was the time to design everything from ground up. In fact, Netscape hired Paul Kocher to work with its own Phil Karlton and Allan Freier to build SSL 3.0 from scratch. SSL 3.0 introduced a new specification language as well as a new record type and a new data encoding technique, which made it incompatible with the SSL 2.0. It fixed issues in its predecessor, introduced due to MD5 hashing. The new version used a combination of the MD5 and SHA-1 algorithms to build a hybrid hash. SSL 3.0 was the most stable of all. Even some of the issues found in Microsoft PCT were fixed in SSL 3.0, and it further added a set of new features that were not in PCT. In 1996, Microsoft came up with a new proposal to merge SSL 3.0 and its own SSL variant PCT 2.0 to build a new standard called Secure Transport Layer Protocol (STLP).6
Due to the interest shown by many vendors in solving the same problem in different ways, in 1996 the IETF initiated the Transport Layer Security working group to standardize all vendor-specific implementations. All the major vendors, including Netscape and Microsoft, met under the chairmanship of Bruce Schneier in a series of IETF meetings to decide the future of TLS. TLS 1.0 (RFC 2246) was the result; it was released by the IETF in January 1999. The differences between TLS 1.0 and SSL 3.0 aren’t dramatic, but they’re significant enough that TLS 1.0 and SSL 3.0 don’t interoperate. TLS 1.0 was quite stable and stayed unchanged for seven years, until 2006. In April 2006, RFC 4346 introduced TLS 1.1, which made few major changes to TLS 1.0. Two years later, RFC 5246 introduced TLS 1.2, and in August 2018, almost 10 years after TLS 1.2, RFC 8446 introduced TLS 1.3.
Transmission Control Protocol (TCP)
Understanding how Transmission Control Protocol (TCP) works provides a good background to understand how TLS works. TCP is a layer of abstraction of a reliable network running over an unreliable channel. IP (Internet Protocol) provides host-to-host routing and addressing. TCP/IP is collectively known as the Internet Protocol Suite, which was initially proposed by Vint Cerf and Bob Kahn.7 The original proposal became the RFC 675 under the network working group of IETF in December 1974. After a series of refinements, the version 4 of this specification was published as two RFCs: RFC 791 and RFC 793. The former talks about the Internet Protocol (IP), while the latter is about the Transmission Control Protocol (TCP).
The TCP/IP protocol suite presents a four-layered model for network communication as shown in Figure C-1. Each layer has its own responsibilities and communicates with each other using a well-defined interface. For example, the Hypertext Transfer Protocol (HTTP) is an application layer protocol, which is transport layer protocol agnostic. HTTP does not care how the packets are transported from one host to another. It can be over TCP or UDP (User Datagram Protocol), which are defined at the transport layer. But in practice, most of the HTTP traffic goes over TCP. This is mostly due to the inherent characteristics of TCP. During the data transmission, TCP takes care of retransmission of lost data, ordered delivery of packets, congestion control and avoidance, data integrity, and many more. Almost all the HTTP traffic is benefitted from these characteristics of TCP. Neither the TCP nor the UDP takes care of how the Internet layer operates. The Internet Protocol (IP) functions at the Internet layer. Its responsibility is to provide a hardware-independent addressing scheme to the messages pass-through. Finally, it becomes the responsibility of the network access layer to transport the messages via the physical network. The network access layer interacts directly with the physical network and provides an addressing scheme to identify each device the messages pass through. The Ethernet protocol operates at the network access layer.
To complete the handshake, the client will once again send a TCP packet to the server to acknowledge the SYN ACK packet it received from the server. This is known as the ACK packet. Figure C-4 shows a sample TCP ACK packet captured by Wireshark. This includes the source (client) port, destination (server) port, initial client sequence number + 1 as the new sequence number, and the acknowledgement number. Adding one to the server sequence number found in the SYN ACK packet derives the acknowledgement number. Since we are still in the three-way handshake, the value of the TCP Segment Len field is zero.
How Does TCP Sequence Numbering Work?
Whenever either of the two parties at either end of the communication channel wants to send a message to the other, it sends a packet with the ACK flag as an acknowledgement to the last received sequence number from that party. If you look at the very first SYN packet (Figure C-2) sent from the client to the server, it does not have an ACK flag, because prior to the SYN packet, the client didn’t receive anything from the server. From there onward, every packet sent either by the server or the client has the ACK flag and the Acknowledgement Number field in the TCP packet.
In the SYN ACK packet (Figure C-3) from the server to the client, the value of the Acknowledgement Number is derived by adding one to the sequence number of the last packet received by the server (from the client). In other words, the Acknowledgement Number field here, from the server to the client, represents the sequence number of the next expected packet. Also if you closely look at the TCP Segment Len field in each TCP packet of the three-way handshake, the value of it is set to zero. Even though we mentioned before that the Acknowledgement Number field in SYN ACK is derived by adding one to the sequence number found in the SYN packet from the client, precisely what happens is the server adds 1 + the value of the TCP Segment Len field from the client to the current sequence number to derive the value of the Acknowledgement Number field. The same applies to the ACK packet (Figure C-4) sent from the client to the server. Adding 1 + the value of the TCP Segment Len field from the server to the sequence number of the last packet received by the client (from the server) derives the Acknowledgement Number field there. The value of the sequence number in the ACK packet is the same as the value of the Acknowledgement Number in the SYN ACK packet from the server.
The client starts sending real application data only after the three-way handshake is completed. Figure C-5 shows the first TCP packet, which carries application data from the client to the server. If you look at the sequence number in that TCP packet, it’s the same from the previous packet (ACK packet as shown in Figure C-4) sent from the client to the server. After client sends the ACK packet to the server, it receives nothing from the server. That implies the server still expects a packet with a sequence number, which matches the value of the Acknowledgement Number in the last packet it sent to the client. If you look at Figure C-5, which is the first TCP packet with application data, the value of the TCP Segment Len field is set to a nonzero value, and as per Figure C-6, which is the ACK to the first packet with the application data sent by the client, the value of Acknowledgement Number is set correctly to the value of the TCP Segment Len field + 1 + the current sequence number from the client.
How Transport Layer Security (TLS) Works
Transport Layer Security (TLS) Handshake
Note
TLS session resumption has a direct impact on performance. The master key generation process in the TLS handshake is extremely costly. With session resumption, the same master secret from the previous session is reused. It has been proven through several academic studies that the performance enhancement resulting from TLS session resumption can be up to 20%. Session resumption also has a cost, which is mostly handled by servers. Each server has to maintain the TLS state of all its clients and also to address high-availability aspects; it needs to replicate this state across different nodes in the cluster.
One key field in the Client Hello message is the Cipher Suites. Figure C-12 expands the Cipher Suites field of Figure C-10. The Cipher Suites field in the Client Hello message carries all the cryptographic algorithms supported by the client. The message captured in Figure C-12 shows the cryptographic capabilities of the Firefox browser version 43.0.2 (64-bit). A given cipher suite defines the server authentication algorithm, the key exchange algorithm, the bulk encryption algorithm, and the message integrity algorithm. For example, in TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 cipher suite, RSA is the authentication algorithm, ECDHE is the key exchange algorithm, AES_128_GCM is the bulk encryption algorithm, and SHA256 is the message integrity algorithm. Any cipher suite that starts with TLS is only supported by the TLS protocols. As we proceed in this appendix, we will learn the purpose of each algorithm.
Note
In the history of TLS, several attacks have been reported against the TLS handshake. Cipher suite rollback and version rollback are a couple of them. This could be a result of a man-in-the-middle attack, where the attacker intercepts the TLS handshake and downgrades either the cipher suite or the TLS version, or both. The problem was fixed from SSL 3.0 onward with the introduction of the Change Cipher Spec message. This requires both parties to share the hash of all TLS handshake messages up to the Change Cipher Spec message exactly as each party read them. Each has to confirm that they read the messages from each other in the same way.
After the Server Hello message is sent to the client, the server sends its public certificate, along with other certificates, up to the root certificate authority (CA) in the certificate chain (see Figure C-14). The client must validate these certificates to accept the identity of the server. It uses the public key from the server certificate to encrypt the premaster secret key later. The premaster key is a shared secret between the client and the server to generate the master secret. If the public key in the server certificate isn’t capable of encrypting the premaster secret key, then the TLS protocol mandates another extra step, known as the Server Key Exchange (see Figure C-14). During this step, the server has to create a new key and send it to the client. Later the client will use it to encrypt its premaster secret key.
If the server demands TLS mutual authentication, then the next step is for the server to request the client certificate. The client certificate request message from the server includes a list of certificate authorities trusted by the server and the type of the certificate. After the last two optional steps, the server sends the Server Hello Done message to the client (see Figure C-14). This is an empty message that only indicates to the client that the server has completed its initial phase in the handshake.
If the server demands the client certificate, now the client sends its public certificate along with all other certificates in the chain up to the root certificate authority (CA) required to validate the client certificate. Next is the Client Key Exchange message, which includes the TLS protocol version as well as the premaster secret key (see Figure C-15). The TLS protocol version must be the same as specified in the initial Client Hello message. This is a guard against any rollback attacks to force the server to use an unsecured TLS/SSL version. The premaster secret key included in the message should be encrypted with the server’s public key obtained from the server certificate or with the key passed in the Server Key Exchange message.
At this point, the client and the server have exchanged all the required materials to generate the master secret. The master secret is generated using the client random number, the server random number, and the premaster secret. The client now sends the Change Cipher Spec message to the server to indicate that all messages generated from here onward are protected with the keys already established (see Figure C-15).
TLS vs. HTTPS
HTTP operates at the application layer of the TCP/IP stack, while the TLS operates between the application layer and the transport layer (see Figure C-1). The agent (e.g., the browser) acting as the HTTP client should also act as the TLS client to initiate the TLS handshake, by opening a connection to a specific port (default 443) at the server. Only after the TLS handshake is completed, the agent should initiate the application data exchange. All HTTP data are sent as TLS application data. HTTP over TLS was initially defined by the RFC 2818, under the IETF network working group. The RFC 2818 further defines a URI format for HTTP over TLS traffic, to differentiate it from plain HTTP traffic. HTTP over TLS is differentiated from HTTP URIs by using the https protocol identifier in place of the http protocol identifier. The RFC 2818 was later updated by two RFCs: RFC 5785 and RFC 7230.
Application Data Transfer
Reverse-Engineering TLS
For each session, TLS creates a master secret and derives four keys from it for hashing and encryption. What if the private key of the server leaked out? If all the data transferred between clients and the server is being recorded, can it be decrypted? Yes, it can. If the TLS handshake is recorded, you can decrypt the premaster secret if you know the server’s private key. Then, using the client-generated random number and the server-generated random number, you can derive the master secret—and then the other four keys. Using these keys, you can decrypt the entire set of recorded conversations.
Using perfect forward secrecy (PFS) can prevent this. With PFS, just as in TLS, a session key is generated, but the session key can’t later be derived back from the server’s master secret. This eliminates the risk of losing the confidentiality of the data if a private key leaks out. To add support for PFS, both the server and the client participating in the TLS handshake should support a cipher suite with Ephemeral Diffie-Hellman (DHE) or the elliptic-curve variant (ECDHE).
Note
Google enabled forward secrecy for Gmail, Google+, and Search in November 2011.