Chapter 9. Foreign Accent

Passive fingerprinting: subtle differences in how we behave can help others tell who we are

On the Internet, the network of networks, information sent to a remote party is beyond the sender’s control and supervision. Unlike on a local Ethernet, which is usually a safe harbor for packets until a stranger wanders in, once data is out in the wild it is no longer possible to estimate and effectively manage threats that it is likely to face, as no single person can control the data’s path or determine the intentions of all parties involved in communications, let alone determine how they approach security. On such a complex network, the likelihood of a middle party becoming malicious is neither negligible nor easy to assess. In fact, even the person with whom you are establishing legitimate communications may have a hidden agenda or simply be a bit curious.

Unsolicited data acquisition attempts, so to speak, are also different when carried out over the Internet for a couple other reasons. Most important, they do not have to be targeted, and they are not limited to a specific segment of physical infrastructure. Because they require so little effort on the part of an attacker, they become a viable route for acquiring potentially interesting data even prior to determining a precise way to profit or otherwise benefit from this knowledge. Too, the line between good and bad becomes even more fuzzy: the attacker can be your best friend. The profitability of general espionage and surveillance for the purposes of marketing reconnaissance and profiling is too tempting for many to resist; the world of service provisioning is not black and white, and flexible ethics is simply a viable business model for many people.

This part of the book focuses on the threats inherent in the open design of the Internet and on the ability of others to obtain way more information about you than you might expect—and more than would ever be needed in order for them to provide you a service such as an interesting website or an enjoyable network-based game. Once on the Internet, the enemy is no longer a lone madman sitting across the street, watching LEDs on the switch through a high-tech telezoom lens. The exposures covered here make it possible to carry out massive profiling, tracking, information gathering, industrial espionage, network reconnaissance, and preattack analysis—and are far more real than the scenarios described previously.

You need to understand the threats in order to maintain an informed level of privacy protection or perhaps to deploy effective monitoring whether of your users or of complete strangers, as they approach your systems. Understanding is also the key to maintaining sanity in a world where the line between being concerned about privacy and becoming clinically paranoid is fairly thin.

I’ll begin with an examination of a set of core network protocols used over the Internet and their privacy implications. Shall we?

The Language of the Internet

The official language of the Internet is called the Internet Protocol, and the most popular dialect is labeled version 4. The protocol, specified in RFC793,[74] provides a way to implement a standardized method for transmitting data over vast distances and a variety of networks with as little effort as possible. IP packets constitute the third layer in the OSI model discussed previously and consist of a header that contains the information necessary to deliver a portion of data to its ultimate destination—the remote endpoint—and a payload constructed of higher-layer information that immediately follows the header data.

The routing information furnished by the sender within the IP packet prior to sending it out consists of the source and destination address and a set of parameters that simplifies the process of data transfer or improves its reliability and performance. When a machine on the local network wants to communicate with a remote party that is not directly reachable over the wire—at least not according to the host’s knowledge—it forwards an IP packet with the ultimate recipient’s destination address, encapsulated in a lower-layer frame addressed to a local machine that is believed to be a gateway to and of the network the sender resides at. The gateway machine is nothing more than a multihomed device—one that has a presence in more than one network, serving as a connection point between them. The gateway is expected to know how to route the packet to the outside world, what to do with the packet, and who should get the data next if there must be more parties involved before the data reaches the recipient.

Systems involved in routing traffic, from the local gateway through to the destination network, read the information provided on the IP layer to decide how to relay the data farther down its path, based on their knowledge of how to reach certain networks. (In this context, a network is defined as a pool of network addresses residing at a specific location.)

Naive Routing

In its basic form, a router uses a fixed routing table with which it distinguishes between a set of local networks (to which it can deliver traffic directly) and the outside world, which is unknown. Thus, all traffic destined for outside the local network must be relayed to a higher-order router that presumably has a better idea of where to deliver the data.

Figure 9-1 shows an example routing structure. The sender (shown at left) attempts to send a packet to a system whose address belongs to network C, a network that the sender knows nothing about. To facilitate delivery, the guy sends the traffic to the local gateway, hoping that it will know where to look for the recipient. However, this system, router 1, can only reach the sender’s own network and network A, another network that has nothing to do with C. Because the target is not on their local network, the router decides it would be best to just send the packet to a higher-rank WAN router (router 2), which it happens to be able to reach locally.

A naive wide area network routing scheme

Figure 9-1. A naive wide area network routing scheme

This device also has no immediate connection with network C; it can only directly reach hosts on networks B and D. However, it knows that router 3 is serving the destination address and thus would surely know what to do. Therefore, the packet is forwarded there, and router 3 can now deliver the traffic locally to the ultimate recipient, at which point all can rejoice and celebrate another success.

Routing in the Real World

In practice, networks are often highly redundant and do not have a strictly linear architecture. They have a complex treelike structure that makes selecting the optimal and most economical path difficult if we were to use a static configuration. (Never mind the challenge of staying up-to-date with all the infrastructure changes as the network grows.)

As such, a more reasonable routing strategy is implemented once the traffic reaches a backbone router. Run by a network operator, a backbone router is a dedicated WAN device that binds many networks controlled by a particular provider into a complex being called an autonomous system. Back-bone routers are typically equipped with interfaces to other large routers and use an advanced path-discovery algorithm and a sizable “phone book” of network blocks and their whereabouts, controlled dynamically by a Boundary Gateway Protocol, to find the best way to route the data to the destination system, without blindly handing out the job of delivering the traffic to some system in hopes that it will be able to relay it properly.

The Address Space

This process would, of course, be quite impractical if destination networks consisted simply of a set of addresses arbitrarily assigned to devices around the world. A definition of an autonomous system would have to list all the addresses and might easily grow to enormous size. To solve this problem, continuous blocks of address space are assigned to backbone service providers instead; providers later lease smaller blocks to end users or lesser service providers. Routing to the provider’s network is based on a lookup of the destination IP within the address ranges assigned to this entity and then within the network based on additional lookup in more detailed routing tables. An autonomous system can thus be defined as a range of IPv4 addresses (or a set of such ranges), using a netmask method.

The single IPv4 address used to uniquely identify an endpoint system in all Internet Protocol communications has a fairly simple structure, consisting of 32 bits, divided for convenience into 4 bytes, a total of 4,294,967,296 possible addresses. The address is traditionally written as four 8-bit values between 0 and 255, with each value separated by dots. For example, 195.117.3.59 corresponds to a 32-bit integer value of 3241036664.

Continuous IP address blocks are the basis for packet routing. They are defined on top of IPv4 addressing by defining the part of the IP address that is fixed and constant for all systems belonging to an autonomous system, as well as the part of the address that will be set to various values by the owner of a network in order to give computers unique identifiers.

When defining a network, a set of more significant bits of an IP—theoretically, anywhere from 1 to 31; practically, 8 to 24—is reserved as a network address. The fixed part of this address is shared by all addresses belonging to (and presumably routed to) this particular network. The less significant remainder bits can be set at will to assign addresses to systems within the network.

Historically (per RFC796[75]), the size of a network or the number of significant locked bits was a function of the address and could be determined from the network address itself. Based on the most important bits of each address alone, addresses were grouped to constitute class A networks (in which the 8 most significant bits are fixed, yielding more than 16 million possible user addresses), class B networks (in which 16 bits are fixed, yielding more than 65,000 hosts), or class C networks (with 24 bits fixed, and 256 possible hosts). Therefore, if your system has an IP address beginning with the number 1, you can tell that yours is a class A network and that all other systems with this prefix are next to your box.

Although this seemed handy at the time, the IPv4 address space shrank significantly once the initial implementers (the U.S. Army, Xerox, IBM, and other behemoths) were assigned a handful of class A network addresses in the early days of the Internet, and seemed not to be very keen on giving them up, despite not using even a fraction of the space they got for public infra-structure. Too, once the Internet became commercial, and IP addresses became a resource that users had to pay for, users demanded chunks of address space that would better fit their requirements; some folks only wanted four addresses, whereas others wanted a continuous space of 8,000. Users began to resell or otherwise partition their Internet space.

The result is that the current address space is partitioned in bizarre ways, often with tiny bits of address space excluded and rerouted from larger, otherwise continuous blocks, with general disregard for the original partitioning scheme. Each network address is now accompanied by a net-mask specification, because it is no longer possible to tell which network a system is on based merely by the IP itself. The netmask has its bits set at positions that should be fixed in the network address and zeroed for positions that can be freely manipulated within a network.

As shown in Figure 9-2, by fixing 24 bits on 195.117.3.0 network, we end up with 8 trailing bits that can be changed. This allows us to create 256 addresses between 195.117.3.0 and 195.117.3.255 that belong to this network (albeit some implementations would force the first and the last address to be reserved for special purposes, leaving only 254 possible hosts). With such a relatively simple specification of a network of addresses, it is easy to determine which addresses belong to this network and thus which should be delivered to a system that is its gateway (and which should not).

Although this addressing scheme may appear confusing and needlessly complicated, it is successful: it lets us associate pools of addresses with specific systems and differentiate between systems with minimum computational effort. The Internet, in all its complexity, usually succeeds in finding a system in a really short period of time, without much maintenance.

Network addressing rules

Figure 9-2. Network addressing rules

Fingerprints on the Envelope

We know how the data makes it from point A to point B—but what happens on the way is more interesting than how the path is determined. Let’s then look more closely at what is being exchanged between the routers and our endpoint systems. Although you might think that the actual data payload inside the packets sent over the Internet contains the most interesting information (considering all the private email and bizarre contents being exchanged around the world every second), there is more than meets the eye.

The format of IP packets used for routing the data, and the layer four information used to encapsulate the actual application-level data, is defined by the RFCs fairly strictly and with surprisingly little ambiguity. However, even with a competent TCP stack implementation, the underlying information can provide considerably and consistently more value to the recipient than the actual payload data it receives. The disclosure on this level is inadvertent and unexpected, but to learn more about it we need to take a closer look at the design of the underlying protocols.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.12.155.100