Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 3

How the Internet Works

This chapter describes the Internet, its components, and how they all interact. It details how the various protocols work and how they relate to investigations of Internet crime.

Key words

World Wide Web; DARPA; FTP; IRC; Internet Protocol address; uniform resource locators; Domain Name System; DNS; IPv4; IPv6; DHCP; DUID

I must confess that I’ve never trusted the Web. I’ve always seen it as a coward’s tool. Where does it live? How do you hold it personally responsible? Can you put a distributed network of fiber-optic cable “on notice”? And is it male or female? In other words, can I challenge it to a fight?

Stephen Colbert, comedian

To many the Internet is fundamentally a confusing and mystical thing. One that touches our lives in ways few could have imaged. Everyday millions of people connect to the Internet in an attempt to be informed, maintain relationships, find new ones, and speak their minds. The Internet has been unique in history due to its ability to connect people together. It is much more than a communication method. The Internet has enabled its users to unite with others in ways that previous generations would never understand. Communication more than 30 years ago was the disconnected use of stationary technology. If you didn’t want to be found you didn’t answer the landline telephone. If you wanted to know the news of the world you picked up a newspaper or watched the 6 o’clock news. The Internet and its ability to communicate information in the form of complete novels to 140 characters has transformed what we think is communication. Today we are attached to our technology. Cell phones are more pervasive than computers. We are connected to the ones we love and those we have never met. The Internet intrudes into our lives at every level. Work hours are spent updating Facebook and seeing what’s tweeted by Lindsay Lohan or Madonna. Home hours are spent watching movies on Netflix and surfing for deals on eBay or Craigslist. So what does this all mean for the Internet investigator? It means that everyone we know, everyone we don’t know in our communities of interest, and everyone else in the connected world is online. The issue now becomes that everyone who is online is now a potential victim or suspect. To start to understand how to deal with this we have to understand the basis of the Internet and its foundations.

A short history of the Internet

The Internet before the World Wide Web (WWW or just the web) was a much different place. The Internet as we believe it to be is based on the innovations of a few bright individuals that thought connecting data on the Internet through a browsing concept was a simple change. Fundamentally browsing was a huge change. In 1989, Sir Tim Berners-Lee wrote a proposal for what would eventually become the WWW. He later helped to develop the concept of using “hypertext” to connect information. Hypertexts today are as ubiquitous as car travel. They are both used everyday and with few considering their profound societal impact. Hypertext has made the Internet available to the masses.

Prior to today’s Internet there did exist a useful and diverse communication medium. Almost everyone knows that the Internet’s beginnings were formed through funding of US projects under the Defense Advanced Research Projects Agency (DARPA) umbrella. The 1960s were a turbulent and dangerous time. Development of a communication method if the then Soviet Union attacked with nuclear weapons was high on the list of military projects. The projects expanded on the already suggested concept of sending packets of information between computers. This research would build the first networks and ultimately build the technology foundation we know today as the Internet. The Internet technology prior to the WWW included a variety of communications and data transfer tools that seem normal to us today. The “Cloud” is an overused term describing the use of the Internet for various storage and access to technology. This storage and other technology existed in the form of file transfer protocol (FTP), Gopher, Simple Mail Transfer Protocol (SMTP), Internet relay chat (IRC), and many others. Each was defined through a common set of protocols. Standard protocols are the basis for the Internet and its function. These protocols serve the Internet community as a common method of understanding and communication. Without standard protocols the Internet would not work or exist. These protocols exist in the Internet world as request for comments (RFC), which derive themselves from the Internet’s beginning as a collaborative group across numerous disciplines. The standard bodies formed through the Internet Society as the Internet Engineering Task Force (IETF) have been effective in directing the growth of the Internet’s technology. Without these protocols the Internet would not work. The advantage for the online investigator is that the protocols are published and available to us to review and understand. First and of foremost importance to the online investigator is the Internet Protocol (IP) addressing scheme.

The importance of IP addresses

The basis for Internet communication is a simple process of assigning each device attached to the Internet an address. This address allows that device to connect with and communicate with any other device connected to the Internet using this same addressing scheme. This addressing scheme is commonly referred to as the IP address. The most commonly used version of the IP addressing is version 4, commonly referred to as IPv4. The IPv4 address is made up of four sets of three numbers. These number sets are referred to as “octets.” Each octet is made up of 256 numbers, 0–255. These 32 bits (4 bytes) of information allows for the connection of 4.3 billion devices through the Internet. The format of the IPv4 address that is commonly used is what is called the dotted decimal, e.g., 123.122.213.012. This number is globally unique.

Let’s take a look at how the IPv4 address looks and is translated. In Figure 3.1, we have a comparison of the IPv4 address and the traditional telephone numbering scheme. The traditional telephone number is similar in format (although not exactly) and an example that will help the investigator new to IPv4 understand how it functions.

Figure 3.1 IP address translation compared to a telephone number.

As in the telephone number example there are four number sets. There is one set each for the International calling code, area code, local prefix, and the local number associated with that house. We all know now that the traditional telephone numbers are no longer just to a house. They can be to a business, mobile phone, and may even be to a fax machine. They are more in line with the IPv4 scheme, where a number is to a particular device. But for our example let’s just say our number goes to a house. In Figure 3.2, we can find a location using a state, city, street, and house address as well as through latitude and longitude. On the Internet we can find a location through an IP address or a uniform resource locator (URL). Similarly in this example, the IPv4 address is broken down into those four octets that identify different parts of the address. Ultimately this leads to the individual device associated with that IPv4 address.

Figure 3.2 How we find a location.

DHCP and assigning addresses

To further confuse the situation of identifying what device is associated with an IP address, we have two ways they get assigned. The first is dynamically, which refers to the use of a pool of IP addresses that get “dynamically” assigned to a device when it requests Internet access. This is commonly done through a process called Dynamic Host Configuration Protocol (DHCP). This protocol is software running on a server, router, or other device that determines the IP addresses assignment to other devices in the network requesting access to the Internet. RFC 2131 describes this as “…automatic allocation of reusable network addresses…” Effectively for the investigator the DHCP assigns the address out of a pool of addresses to each device that connects to the network. This becomes part of the investigation trail that needs to be followed. The server or router assigning the addresses is another link in the chain of locations that may require the request of information from logs to prove what device was assigned a particular IP address at a given time (Figure 3.3).

Figure 3.3 DHCP assignment of IP addresses.

Through the DHCP process, the addresses assigned to each device allow them individually to access the network without conflicting with another device. Devices, if assigned the same address, would not function properly on the network because information being sent to an address would not know which device to send the data, causing a conflict. Uniquely assigned addresses allow the data to flow to a device without this conflict. From an investigative viewpoint, this also identifies a specific device to which the address was assigned.

The assignment of an address specifically for the use of that device and that device only is called a “static” IP address. This allows the device to always have the same address when it connects to the network or the Internet. The advantage is that the device can always be found easily by other devices on the Internet looking for that device. As an example, a server providing a service such as a webpage or an FTP will also want to be found by its users. Assignment of a “static” or permanent address helps facilitate their return to that same location on the Internet. A dynamically assigned IP address would make this reconnection more difficult.

Dynamic DNS Services

There are programs that facilitate the use of dynamically assigned addresses by an ISP to allow for public Internet resources to find a resource with a dynamic IP address. These services act as a Domain Name Server that constantly updates the DNS system with the new address for your Internet resource assigned the dynamic address. The investigator should be aware that the term “DNS” has more than one common meaning. It is used both to refer to a Domain Name Server as well as a reference to the overall Domain Name System.

Tracing the IP Address to a Device

Tracing that individual IP address to a device over the Internet and through a network requires several steps (Keep in mind that this is a general go by as to how to trace an IP address. If the criminal is using any tools to obfuscate his address or hide his real IP address, the end results might require additional investigative actions):

1. Identify the correct IP address: This can be found potentially in the header of an email, a posting on a blog, or through a direct connection with the target by trapping the IP address through a tool like Netstat.

2. Identify the owner of the IP address: Identifying the owner of the IP address is usually done through doing a domain registration lookup or Whois lookup. This can be done through numerous online tools or through the Internet Investigators Toolkit (see Chapter 6 for further details).

3. Contact the IP address owner: Provide the IP address owner with the date and time, including the time zone and Coordinated Universal Time (UTC) the IP address was used in your investigation. The IP address owner will most likely require legal service of a subpoena or search warrant.

4. Research the information from the IP address owner: A general investigative background on the name and any address information needs to be done on the information provided by the IP owner. This information may be correct to the device used to connect to the Internet, but it may not be the target of your investigation. For instance, the device may be in a residence or business with multiple users. This information could also provide you with the wireless router that was used by another device to access the Internet.

5. Contact the owner of the device identified as accessing the Internet: The owner of the IP address will provide you with the next step in the chain of the investigation. Ensure that you have the required legal service ready when taking this next step. A simple “Knock and Talk” could also serve the purpose of identifying who accessed the Internet from the identified device at a specific location. The wireless router in question could contain logs of access and IP address assignment that may prove useful to your investigation. The logs may, if turned on, provide the investigator with device-specific information like the device network interface card (NIC) unique identifier called the media access control (MAC). The MAC address is used to identify specific devices attached to networks. However, this address is not passed through the router and will only be found at this level of the investigation (Figure 3.4).

Figure 3.4 Device identification by IP address assignment.

MAC address

The last stop on the investigative journey to identifying the user of an IP address is the last router in the chain. This is most often at the business network the user is connected to or the home router used to access the Internet. This last router in the chain may contain logs of those devices connected to it that accessed the Internet. Most routers have logging but may not have the logging turned on to record the access. If the logging is turned on, the router will record the access of a device through its MAC address. The MAC address is a unique identifier assigned by the manufacturer of NICs (either the Ethernet connection or a wireless connection). The MAC address is used by the router to differentiate between devices attached to the router (Figure 3.5).

Figure 3.5 Example of router details of connected device.

The MAC address is six pairs of hexadecimal numbers separated by colons, broken into two sections. An example of a MAC address is 76:e5:43:77:64:86. The hexadecimal digits used in the MAC address include only the numbers 0–9 and letters A–F (see the Hexadecimal to ASCII chart in Appendix A for more details). The first section is the first three pairs of digits. This part of the MAC address is the Organizational Unique Identifier (OUI) or Vendor ID (IEEE-SA Registration Authority, 2012). Investigators can identify the MAC address manufacturer by the second section of the MAC address containing the last three digit pairs. These three hex pairs are unique to the device. The MAC address is useful during the investigation to identify the device that was attached to the router and assigned a specific IP address. Locally at the target machine, the MAC address can be confirmed by opening a command prompt and running the command “ipconfig /all.” This command will provide the investigator with confirmation of the target machine as the device that was connected to the router. In the Windows IP Configuration information under the header “Physical Address,” the investigator will find the MAC address.

Accessing the Windows IP Configuration in Vista and Windows 7

1. Click on the Windows Start button.

2. Click in the “Search programs and file” box.

3. Type the following cmd and press the “Enter” button.

4. A black console window will open. In the console window, type “ipconfig” and press enter. You will now see the IP address, Subnet Mask, and Default Gateway for each active network connection in your computer. If you type “ipconfig /all” additional information about each connection will be presented, including the connections DNS Servers and the network card MAC address (Figure 3.6).

Figure 3.6 Identification of MAC address on target machine.

Investigators need to be aware that the MAC address can be spoofed through various tools. This is a common technique by criminals to connect to a router without being tracked back to the specific target device by that unique number (Figure 3.7).

Figure 3.7 Identification of MAC address from a small office/home office router.

Domain Name System

In the middle of the browser request to the web server containing a webpage to be viewed is a process that identifies where in the world that webpage exists. The Domain Name System (DNS), sometimes referred to as the Domain Name Servers, is a large database of IP and URLs. The DNS is something similar to a large phone directory. The browser makes a request through the DNS system to identify the IP address of a URL. The DNS process looks up the address in its database and if it knows the IP address it passed the addresses back to the browser. If it does not know the address, it passes the request up to the next higher DNS server to assist in the identification of the IP address. The IP address when identified is passed back to the browser. The browser then makes a request to the IP address for the webpage at that address. The web server at that IP address then sends the webpage requested to the browser which then displays the webpage (Figure 3.8).

Figure 3.8 Domain Name System IP address lookup.

DNS records

Each DNS contains a series of records or “resource records” that describes information on each domain. These records include information about the domain so that when a request about the domain is made the correct information can be provided to the requestor. The information contained in the record includes information about the assigned IP address, any potential alias used, which DNS server has the authoritative record about the domain mail server records and other domain records to guide the request to the right location. Getting these records use to be fairly simple with a command line lookup tool found in most operating systems called NSLookup. This tool provided the list of available records for the domain. However, recent hacking attempts of DNS servers have had any reputable DNS server administrator now refusing those requests for information. An available free tool to view the available DNS records in a single view is DNSDataView from Nirsoft (Figure 3.9).¹

Figure 3.9 DNS records using DNSDataView.

The records here provide the investigator with information as to the ownership and available locations that information may be obtained (some with further online research and some with legal service on the IP address owner) (Always keep in mind that all this work may bring you to a false address or an uninvolved party. Criminals can and do use methods to hide themselves. However, all of this still needs to be done to track down the possible leads in the investigation). In the example in Figure 3.9, we have found the following records regarding the DNS record of the domain veresoftware.com, NS, MX A SOA, and PTR. We can identify each of these Record Types using Table 3.1. Also identified with each Record Type is the Host Name and IP address associated with that record. This information provides the investigator with a more complete picture of the domain and its associated connections to other servers on the Internet.

Table 3.1

Domain Record Types

^aA detailed explanation of each record can be found on the Microsoft Technet Library (http://technet.microsoft.com/en-us/library/dd197499(v=WS.10).aspx).

In RFC 1035—Domain Names—Implementation And Specification,² the document lists the various record types listed for a domain. Each of these various record types contains specific information on that aspect of the domain.

A general description of each of these records can be found in PC magazine’s Encyclopedia of IT terms.³ The following explanation of some of these records is from their website:

Forward DNS and Reverse DNS (A and PTR): The Address (A) record associates a domain name with an IP address, which is the primary purpose of the DNS system. The Pointer (PTR) record provides data for reverse DNS, which is used for logging the domain name and verification purposes. Also called “inverse DNS,” the PTR record is an option.

Aliasing Names (CNAME): The Canonical Name (CNAME) record is used to create aliases that point to other names. It is commonly used to map WWW, FTP, and MAIL subdomains to a domain name; for example, a CNAME record can associate the subdomain FTP.COMPUTERLANGUAGE.COM with COMPUTERLANGUAGE.COM.

DNS Name Servers (NS): The name server (NS) record identifies the authoritative DNS servers for a domain. A second NS is required for redundancy and two NS records must be in the zone file (one for the primary; one for the secondary). The secondary server queries the primary server for changes.

Mail Servers (MX): The mail exchange (MX) record identifies the server to which email is directed. It also contains a priority field so that mail can be directed to multiple servers in a prescribed order.

Text Record (TXT): A TXT record can be used for any kind of documentation. It is also used to provide information to the Sender Policy Framework (SPF) email authentication system.

First Record in File (SOA): Start of authority (SOA) is the first record in the zone file. It contains the name of the primary DNS server, which must correspond to an NS record in the file, the administrator’s email address and the length of time records can be cached before going back to the authoritative DNS server.

Domain Name Service

In his testimony before the Senate Committee on Commerce, Science and Transportation, Subcommittee on Communications, on February 14, 2001, Michael Roberts, President and CEO of ICANN, said “In recent years, the domain name system (DNS) has become a vital part of the Internet. The function of the domain name system is to provide a means for converting easy to remember mnemonic domain names into the numeric addresses that are required for sending and receiving information on the Internet. The DNS provides a translation service that permits Internet users to locate Internet sites by convenient names (e.g., http://www.senaste.gov) rather than being required to use the unique numbers (e.g., 156.33.195.33) that are assigned to each computer on the Internet.” Today the Internet would not work without the DNS system.

Internet Protocol Version 6

An updated version of the IP protocol version, Ipv6 (Internet Protocol Version 6), is slowly being implemented and will eventually replace the IPv4 system. IPv6 is the next protocol version that is the basis for most communications on the Internet. IPv4 addresses started to run out in 2011. The requirement to move to a new system of addressing devices on the Internet is imperative. The effect this has on Internet investigations is significant. Investigators have had a general understanding of the IPv4 system and how to trace IPv4 addresses. IPv6 addresses are very different and require a new understanding of the IPv6 protocol. The immediate issue is the two protocols are not compatible. IPv6 is coming to a crime near you. The official launch of the IPv6 protocol occurred on June 5, 2012.

Defining IPv6

What the investigator will immediately notice is the Ipv6 addresses are much more complex and harder to remember then their Ipv4 cousins. IPv6 uses 128-bit addresses. Like IPv4, IPv6 numbers are broken into groups. IPv6 has eight groups of four numbers separated by colons (:) not periods (.) as in the IPv4 design. The four numbers in the each eight groups are hexadecimal and not numerical as in IPv4. The larger set under IPv6 provides a number of addresses that is never expected to run out. As an example, the following is an Ipv6 address: 2001:0db8:85a3:0042:0000:8a2e:0370:7334.

Translating IPv6

A single IPv6 address is defined under RFC 4291 as eight sets of four hexadecimal numbers, such as ABCD:EF01:2345:6789:ABCD:EF01:2345:6789. However, the standard allows for a variety of representations of the IPv6 address. The IPv6 address can be represented in different ways and the investigator should know these various methods to identify them in an investigation. The following Table 3.2 from RFC 5952 describe how a single IPv6 address can be represented.

Table 3.2

Examples of Various IPv6 Representations

2001:db8:0:0:1:0:0:1	2001:0db8::1:0:0:1
2001:0db8:0:0:1:0:0:1	2001:db8:0:0:1::1
2001:db8::1:0:0:1	2001:db8:0000:0:1::1
2001:db8::0:1:0:0:1	2001:DB8:0:0:1::1

In the IPv6 examples, letters are not differentiated by capital or lowercase, zeroes can be dropped and whole segments, if zero eliminated, and represented only by the colon (:) separating the segments. What this does for the investigator is provide very different looking formats for the addresses when looking at them to identify that they are IPv6 addresses. This IPv6 example would fully be represented as 2001:0db8:0000:0000:0001:0000:0000:0001 or 2001:db8:0:0:1:0:0:1.

IPv6 has three types of addresses, which can be categorized by type and scope: (Technet Microsoft). In Table 3.3, the types include Unicast (and its variations of link-local, site-local, and site-local), Multicast, and Anycast. From an investigative point of view, these address types are not initially remarkable, however, being able to identify where an IPv6 address is used can assist the investigator to determine its relevance.

Table 3.3

Types of IPv6 Addresses

The IPv6 address for your computer can be found using the command prompt and running the “ipconfig /all” command. In Figure 3.10, you can see the identified IPv6 address assigned to the system and the “link-local” IPv6 address.

Figure 3.10 IPv6 address from ipconfig command.

Ipv4-Mapped IPv6 addresses

For the investigator’s purposes, IPv4 can be mapped to IPv6 addresses under certain circumstances. This mapping is intended to aid in the migration from the IPv4 protocol to IPv6. However, this is not a direct translation of the IPv4 address to an IPv6 address. The IPv6 RFCs allow for mapping IPv4 address in the IPv6 addressing scheme. In two circumstances, RFC 4291 describes these implementations. Typically the IPv4 address is embedded in the IPv6 address. An example could be IPv4 address 97.74.74.204 mapped to an IPv6 address: 0:0:0:0:0:ffff:614a:4acc.

In hexadecimal 614a:4acc translates to the IPv4 address 97.74.74.204.

When mapping an IPv4 address to an IPv6 address, the 128 bits of the IPv6 address are broken into three parts. The first 80 bits, or the first five segments of the IPv6 address, are zeros. The second 16 bits or the next segment in the IPv6b address is either zeros or hexadecimal FFFF. The last 32 bits or two segments of the IPv6 address is the IPv4 address (Table 3.4; Figure 3.11).

Table 3.4

IPv6 Address Space Assignment

IPv6 Prefix	Allocation
0000::/8	Reserved by IETF
0100::/8	Reserved by IETF
0200::/7	Reserved by IETF
0400::/6	Reserved by IETF
0800::/5	Reserved by IETF
1000::/4	Reserved by IETF
2000::/3	Global Unicast
4000::/3	Reserved by IETF
6000::/3	Reserved by IETF
8000::/3	Reserved by IETF
A000::/3	Reserved by IETF
C000::/3	Reserved by IETF
E000::/4	Reserved by IETF
F000::/5	Reserved by IETF
F800::/6	Reserved by IETF
FC00::/7	Unique Local Unicast
FE00::/9	Reserved by IETF
FE80::/10	Link Local Unicast
FEC0::/10	Reserved by IETF
FF00::/8	Multicast

http://www.iana.org/assignments/ipv6-address-space/ipv6-address-space.xml.

Figure 3.11 Example of an IPv4 address mapped to IPv6.

IPv6 DUID

For the investigator the DHCP Unique Identifier (DUID) is the last stop in the trail of identifying a device. DUIDs are used in the IPv6 addressed network to uniquely identify devices connected to the system. This is similar to MAC addresses use in an IPv4 router to identify individual devices. There are four types of DUIDs found within the IPv6 DHCP system to identify devices associated to the system (RFC 3315 and 6355). DUIDs are intended to remain constant over time, so that they can be used as permanent identifiers for a device. The four types are found in Table 3.5.

Table 3.5

Types of DUIDs Found Within the IPv6 DHCP System

Type		Description
DUID-LLT	Link-layer address plus time	The link-layer address of one of the device’s network interfaces, concatenated with a timestamp
DUID-EN	Vendor based on enterprise number	An enterprise number plus additional information specific to the enterprise
DUID-LL	Link-layer address	The link-layer address of one of the device’s network Interfaces
DUID-UUIDs		Derived from standardized Universally Unique IDentifier (UUID) format

An example of a DHCPv6 (DHCP for IPv6) client DUID is 00-01-00-01-17-96-F9-3A-28-92-4A-3F-6C-47.

It can be broken down as in the example below:

Global Identifier	MAC Address from Ethernet Adapter
00-01-00-01-17-96-F9-3A	28-92-4A-3F-6C-47

Each DUID variation produces a unique identifier. With this one can potentially obtain the MAC address of a given device located on the machine. However, Windows appears to be maintaining the DUID over time and not reassembling a unique identifier based on hardware changes. So a direct connection to a MAC address on a hardware device might not be possible. However, from an investigative viewpoint, maintaining a unique identifier on the machine even when hardware changes are made could be extremely valuable to the investigator (Figure 3.12).

Where Is the DUID in the Windows O/S?

For reference the online investigator can verify the DUID with a request to their digital forensic examiner. The digital forensic examiner can find the DUID on the target computer when it is secured by looking in the following Windows registry (the registry is a hierarchical database that stores configuration from settings from Windows) key on the machine: HKEY_LOCAL_MACHINESYSTEMControlSet001ServicesTcpip6ParametersDhcpv6DUID.

Figure 3.12 Windows IP configuration showing DUID and MAC address.

The World Wide Web

The WWW or “web” is the basis for what most people think is the Internet. However, the WWW is just one of the services on the Internet. It started after the concept of Sir Tim Bernes-Lee’s concept of a “hypertexting” in a browser was adopted as the preferred method of moving through the WWW. The web is a collection of publicly accessible documents (text, images, audio, video, etc.). Users view the pages via a browser (Internet Explorer, Chrome, Firefox, Opera, etc.) running on the local machine. The webpages often contain hypertext links referring to webpages or other documents. Clicking a mouse’s pointer on these links calls up the referenced document or webpages. See Figure 3.13.

Figure 3.13 How a web browser actually gets a webpage to display.

Uniform resource locators

The URL has become the most recognizable part of the web. An example of a fully qualified domain name is www.veresoftware.com. A domain name is commonly used now to identify companies, market to individuals, and find your favourite site on the web. A URL starts with “http://,” which identifies the protocol to be used on the Internet and stands for hypertext transfer protocol. After the protocol usually comes the designator WWW, which we all know now stands for World Wide Web. Adding the WWW can be optional today because most browsers today will add the WWW. Additionally you may encounter WWW2 or WWW3 prior to a web address. These addresses and other prefixes can be used by an organization to identify other web content or websites, but don’t refer to any standards or Internet protocols. After identifying the protocol to be used, the domain name is the next significant part of the URL. The domain name is the registered name that identifies the location the browser will request information from on the Internet. A domain is formatted like veresoftware. At the end of the domain is the top level domain (TLD). TLDs are what the user commonly identifies as the end of the domain registration. The TLD identifies the highest level of the hierarchical structure of the URL. TLDs historically included the commonly recognized.com, .org, .mil, .edu. Over the past several years, mainly due to the increasingly diminishing English language domain availability, new TLDs have been added to the list (Table 3.6).

Table 3.6

List of Current TLDs

AERO	JOBS
AR	MIL
ARPA	MOBI
ASIA	MUSEUM
BIZ	NAME
CAT	NET
COM	ORG
COOP	POST
EDU	TEL
GOV	TRAVEL
INFO	XXX

Country codes will appear after the TLD as a designator of the country to which the domain is registered. Country codes are by standard a two letter code at the end of the URL. Figure 3.14 provides an example of a properly formatted URL.

Figure 3.14 Description of the parts of a URL.

Domain name registration

So, how does one get a domain registered in the name of their choice? Today it is fairly simple to do. One of hundreds of domain registrars are available on the Internet. A simple search of the term “domain registrar” on Google will bring up hundreds of results, such as d1.com, GoDaddy.com, Network solutions, and many others. With each of these a credit card number and the basic name and address information gets you the domain of your choice (that is if the domain you select is available). The registration of a domain name is for specified period of time from generally 1 year or more. The domain registrar submits the names to the Internet Corporation for Assigned Names and Numbers (ICANN) who is responsible for the actual assignment of Internet addresses. The investigator should be aware that any or all of this information can be falsified by the person registering a domain.

ICANN is a nonprofit organization formed under the direction of the US Department of Commerce in 1998 (ICANN 1998) to administer the domain name registration process and the DNS. ICANN has since entered into agreements with other authorities designed to assist in domain registrations for various areas around the world. The following are the five regional Internet registry (RIR) service regions:

• RIPE, the Europeans IP Networks

• AFRINIC, the African Internet Numbers Registry

• APNIC, the Asia Pacific Network Information Center

• ARIN, the American Registry for Internet Numbers

• LACNIC, the Latin American and Caribbean Internet Addresses Registry.

In 2000, ICANN entered into another agreement with the US Government to operate the Internet Assigned Numbers Authority (IANA). At the time, the University of Southern California had been operating the functions of the IANA through a contract with the DARPA (Figure 3.15).

Figure 3.15 ICANN structure for assignment of domain names.

Internationalized domain names

Until 2009, the characters used to register domain names were only the English language or the Latin alphabets. These conformed with the American Standard Code for Information Interchange (ASCII). After 2009, ICAN allowed the introduction of domain names in different languages. From an investigative viewpoint, this becomes an increasingly more difficult process to identify users of international domain names (IDN) if the investigator cannot read the domain name.

Autonomous system number

Autonomous system number (ASN) is a public globally unique number used to exchange routing information between networks with assigned ASNs. These numbers are assigned to an ISP whose networks are connected to the Internet.

Other services on the Internet

As noted the Internet is not just the WWW. There are many other potential areas for an investigator to be concerned about when it comes to investigating crimes on the Internet. Located on the Internet are a variety of services not accessed through the use of an Internet browser. Each protocol listed has its RFC describing its use and they all predated the WWW. From an investigative point of view, which will be discussed later in the following chapters, each has very different approaches and problems for the investigator. Our discussion here is an introduction to several of the more commonly identified Internet protocols. These protocols can be used in an investigation as a source for identifying criminal use or as intelligence on criminal behavior.

File transfer protocol

FTP as a protocol predates the public release of the Internet by decades. FTP stands for file transfer protocol. Prior to the hyperlinks present in our current WWW, FTP was the predominate method of transferring files from a place where it was stored on a server to a user’s computer. In fact FTP was designed prior to the current design of the IP addresses as we know it. File transfer is still in use as a method of transferring large files. The concept of FTP file transferring is in use in various Cloud services used throughout the Internet. The FTP protocol lets a client connect directly with an FTP server using port 20.⁴ The transfer of files through this connection is directly through the IP address and/or domain (Figure 3.16).

Figure 3.16 FTP communications between client and server.

Email or the SMTP

SMTP is the protocol for transferring electronic mail. RFC 5321 describes the protocol for the use of sending mail between mail servers also referred to as mail transfer agents. SMTP has a dedicated well-known port number 25. It is not the protocol for collecting mail by a user. There are two typical protocols users employ to download their email. They are Post Office Protocol (POP) and Internet Message Access Protocol (IMAP). Both allow the users to collect their email from a mail server and view it locally, but do it from a slightly different manner (Table 3.7).

Command Line Use of SMTP Protocol

SMTP is such a simple protocol. Using Telnet⁵ to connect to port 25 on a remote host you can type an email from the command line using the SMTP commands. This technique is usually blocked today due to hacking/phishing misuse but in the past it use to be a common way to illustrate the use of SMTP commands. The example below shows an email sent by command line from Samuel on yourmail.123.com to Lindsey on mymail.xyz.com.

% telnet mymail.xyz.com.25

Trying 162.21.50.4…

Connected to mymail.xyz.com

Escape character is ‘^]’

220 mymail Sendmail 4.1/1.41 ready at Tue, 29 Dec 2012 19:23:01 PST

helo yourmail.123.com

250 mymail Hello yourmail.123.com, pleased to meet you

mail from:<samuel@ yourmail.123.com>

250 <samuel@ yourmail.123.com>… Sender ok

rcpt to: <[email protected]>

250 <[email protected]>… Recipient ok

data

354 Enter mail, end with “.” on a line by itself

Hello Lindsey, how are you?

250 Mail accepted

quit

221 mymail delivering mail

⁵ Telnet is a program that allows a computer user to log into another computer via a text-based interface.

Table 3.7

Basic SMTP Commands

Command	Syntax	Function
Hello	HELO <sending-host>	Identify sending SMTP
From	MAIL FROM:<from-address>	Sender address
Recipient	RCPT TO:<to-address>	Recipient address
Data	DATA	Begin a message
Quit	QUIT	End the SMTP session

Post Office Protocol

POP allows a user’s mail client to connect to an SMTP server that contains electronic mail items. It is the simplest of the mail protocols that uses only a few commands to connect to and accept emails from a mail server. The commands allow the users’ email program to download email and delete email from the server. No other manipulation occurs between the email program and the email server.

Internet Message Access Protocol

IMAP also allows access to and the downloading of emails from a mail server. However, IMAP is more complex in that it allows the users email client to access the emails on another server as if it were locally stored. This allows the user’s email client to manipulate emails stored on the server without transferring the messages between computers (Figure 3.17).

Figure 3.17 SMTP communications between clients.

News groups, Usenet, or the Network News Transfer Protocol

Network News Transfer Protocol (NNTP) historical framework comes from Usenet, an early message transfer system. The Usenet system originally communicated over telephone connections between the servers and ultimately transformed into the Internet protocol known as NNTP. Usenet is a network of servers without a central server. Usenet has historically been a popular way to anonymously post and transfer files for exchange between users. Usenet messages look and act much like email in that there is a message format based originally on the email protocol. However, the message posting is globally to the system and viewable by everyone on the network instead of directed to a single user. Usenet messages are accessed using a “newsreader” that functions as a message reader and a tool to post messages to the system. The benefit of Usenet is the ability to read any message and post a message back to the public network. Any user connected to the Usenet system can read and post a response to the same message. The concept is like a large bulletin board where everyone can see and post a note in response to the message.

The Usenet network stores large amounts of data and this data may not remain for extended periods of time. The Usenet servers are designed to store data until it runs out of disk space. This retention process causes data to drop off the Usenet system and be potentially unretrievable. Also, posting a file to a Usenet server may not be seen on other servers until the file is shared or “propagates” across the NNTP network. Usenet uses a hierarchy for its groups that users can download and post messages. The hierarchy is given in Table 3.8.

Table 3.8

Usenet Hierarchy

Hierarchy	Description
comp.	Newsgroups discussing computer-related topics
humanities.	Groups discussing the humanities, such as literature and art
misc.	Miscellaneous topics that don’t fit other hierarchies
news.	Groups discussing Usenet itself and its administration
rec.	Recreation topics, such as games, sports, and activities
sci.	Science newsgroups, covering specific areas
soc.	Society and social discussions
talk.	Groups discussing current events
alt.	Groups discussing any topic not defined above

An example of a Usenet group is alt.sex.bondage, which discusses bondage and sadomasochism. Other private hierarchy listings can occur depending on the company or geographic location.

Chatting with IRC

Internet relay chat, or more commonly referred to as IRC, provides the user the ability to communicate through real-time text messaging. IRC is accessed through a client that gives the user access to the IRC hierarchy of servers and “channels.” In these channels, the user can “chat” through written text messages with other users accessing the channel or directly with an individual member of the channel. Users can join existing channels for these communications or make their own.

Relevant RFCs

The following RFCs form the basis for the design and control of the Internet. Each RFC addresses a specific topic related to governance of the various features of the Internet. This is not a complete list of RFCs governing the makeup of the various parts of the Internet. These can all be found at the website of the IETF at www.ietf.org.

RFC: 2131 DHCP, http://www.ietf.org/rfc/rfc2131.txt

RFC: 3315 DHCPv6, http://www.ietf.org/rfc/rfc3315.txt

RFC: 4292 IP Version 6 Addressing Architecture, http://tools.ietf.org/html/rfc4291

Guide to Mapping IPv4 to IPv6 Subnets, http://tools.ietf.org/html/draft-schild-v6ops-guide-v4mapping-00

DHCPv6, https://tools.ietf.org/html/rfc3315#page-19

Definition of the UUID-based DHCPv6 Unique Identifier (DUID-UUID), http://www.ietf.org/rfc/rfc6355.txt.pdf

RFC: 1036, Standard for interchange of USENET messages, http://tools.ietf.org/html/rfc1036

SMTP, http://tools.ietf.org/html/rfc5321

POP—Version 3, http://www.ietf.org/rfc/rfc1939.txt

RFC: 3501, IMAP—Version 4rev1, http://tools.ietf.org/html/rfc3501

RFC: 3977, NNTP, https://tools.ietf.org/html/rfc3977

RFC: 1459, IRC Protocol, http://www.ietf.org/rfc/rfc1459.txt

RFC: 2812, IRC: Client Protocol, http://tools.ietf.org/html/rfc2812

RFC: 2810, IRC: Architecture, http://tools.ietf.org/html/rfc2810

Conclusion

This chapter covered a description of numerous topics related to the construction of the Internet and its various parts. We discussed how IP addresses affect the process of communication between computers and how IP addresses can be used effectively to further an investigation of crimes on the Internet. We also described how various protocols have been established to describe and control the various functions of the Internet. These protocols include Internet functions for sending email, exchanging files, and using newsgroups.

Table of Contents for
Chapter 3. How the Internet Works

How the Internet Works

Key words

A short history of the Internet

The importance of IP addresses

DHCP and assigning addresses

MAC address

Domain Name System

DNS records

Internet Protocol Version 6

Defining IPv6

Translating IPv6

Ipv4-Mapped IPv6 addresses

IPv6 DUID

The World Wide Web

Uniform resource locators

Domain name registration

Internationalized domain names

Autonomous system number

Other services on the Internet

File transfer protocol

Email or the SMTP

Post Office Protocol

Internet Message Access Protocol

News groups, Usenet, or the Network News Transfer Protocol

Chatting with IRC

Relevant RFCs

Conclusion

Further reading

Table of Contents for Chapter 3. How the Internet Works

Create new playlist

Sign In

Sign Up

How the Internet Works

Key words

A short history of the Internet

The importance of IP addresses

DHCP and assigning addresses

MAC address

Domain Name System

DNS records

Internet Protocol Version 6

Defining IPv6

Translating IPv6

Ipv4-Mapped IPv6 addresses

IPv6 DUID

The World Wide Web

Uniform resource locators

Domain name registration

Internationalized domain names

Autonomous system number

Other services on the Internet

File transfer protocol

Email or the SMTP

Post Office Protocol

Internet Message Access Protocol

News groups, Usenet, or the Network News Transfer Protocol

Chatting with IRC

Relevant RFCs

Conclusion

Further reading

Table of Contents for
Chapter 3. How the Internet Works