Chapter 10. Reference and Lookup: Tools for Figuring Out Who Someone Is

Each alert or logfile line that reports an event provides some basic information about the source of the event. Just from the IP address, you can derive information about geographic location and do a reverse DNS lookup. This chapter covers tools that help you track the identity of a host.

This chapter is focused on the idea of “walking up” the OSI stack, mentioned in “The Basics of Network Layering”. I like to view the OSI layer as a sequence of lookup processes. Each layer offers a different piece of addressing information, such as the MAC address at layer 2, the IP address at 3, and the ports at 4. This information is moved between layers through the agency of various referencing systems: ARP maps IP addresses to MAC addresses, DNS maps domain names to IP addresses, and so on. Again, the abstraction isn’t perfect—DNS translation doesn’t move us up or down the OSI stack—but by walking up each layer, we can describe what the addresses mean and when they are relevant to investigation.

The remainder of this chapter is structured as follows: a section on MAC addresses, then IPv4 and IPv6, followed by internet-layer information, then DNS, then higher-level protocols. Finally comes a discussion of other important tools that don’t fit into the layering model—in particular, reputation databases and malware repositories.

A general comment on the data discussed in this chapter: much of what is referenced here is maintained by a crazy quilt of entities with differing concepts of the information they should provide. Some do good jobs, some do bad jobs, some intentionally obfuscate everything they provide. In many cases, you will want to pull the same data from multiple sources to validate it, and take everything you read with a grain of salt.

MAC and Hardware Addresses

Chapter 2 discusses the basics of a media access control (MAC) address. MAC addresses are defined in the network hardware to provide a locally unique address for each host within a single layer 2 network. The majority of MAC addresses follow the 48-bit Extended Unique Identifier (EUI) standard: 6 bytes expressed hexadecimally (e.g., 08-21-23-41-FA-BB). More modern network hardware may use EUI-64, which adds an additional 16 bits. When a frame goes from a 48-bit system to a 64-bit system, the 48-bit address is padded to 64 bits.

Figure 10-1 shows how the EUI-48 and EUI-64 break down.

nsd2 1001
Figure 10-1. The EUI-48 and EUI-64 standards

Note two things in particular. First, if an EUI-48 is converted to an EUI-64, you can tell this by looking at bytes 3 and 4, which will be FFFE. More important is that the first 3 bytes are the organizationally unique identifier (OUI), which is a 24-bit value assigned by the IEEE to the hardware manufacturer. OUIs are fixed serial numbers, and if you know the OUI, you can find out who manufactured the card. The IEEE maintains a list of OUI assignments, where you can use a search engine to find OUIs by company, or companies by OUI.

For example, consider the following packet from a pcap:

$ tcpdump -c 1 -e -n -r web.pcap
reading from file web.pcap, link-type EN10MB (Ethernet)
00:37:56.480768 8c:2d:aa:46:f9:71 > 00:1f:90:92:70:5a, ethertype IPv4 (0x0800),
		length 78: 192.168.1.12.50300 > 157.166.241.11.80: Flags [S],
		seq 4157917085, win 65535, options [mss 1460,nop,wscale 4,nop,
		nop,TS val 560054289 ecr 0,sackOK,eol], length 0

The communication goes from 8c:2d:aa:46:f9:71 to 00:1f:90:92:70:5a. Looking these up tells us that 8c:2d:aa belongs to Apple, and 00-1f-90 belongs to Actiontec Electronics, which makes Verizon’s FIOS routers.

MAC addresses operate entirely within the scope of the local network. To communicate beyond the borders of a router, the host must have an IP address. The relationship between a local MAC and an IP address is managed through the Address Resolution Protocol (ARP). Individual hosts maintain ARP tables that contain mappings between IP addresses and MAC addresses on a network. For example, on my local host, I can query the ARP table using arp -a:

$ arp -a
wireless_broadband_router.home (192.168.1.1) at 0:1f:90:92:70:5a on en1 ifscope
/[ethernet]
new-host-2.home (192.168.1.3) at 0:1e:c2:a6:17:fb on en1 ifscope [ethernet]
new-host.home (192.168.1.4) at cc:8:e0:68:b8:a4 on en1 ifscope [ethernet]
apple-tv-3.home (192.168.1.9) at 7c:d1:c3:26:35:bf on en1 ifscope [ethernet]
? (192.168.1.255) at ff:ff:ff:ff:ff:ff on en1 ifscope [ethernet]

Do the lookups and you’ll find that I really like Apple hardware.1

Analytically, MAC addresses (when you can get them, and you’ll normally have them only for your local network, as already explained) are particularly useful for identifying and differentiating hardware, particularly networking hardware such as routers. IP addresses are considerably more fungible than MAC addresses, and if you need to track a mobile asset like a laptop or anything moderated through DHCP, the MAC address will be your best asset for doing so.

IP Addressing

IP addresses are the most commonly accessed piece of information about a host, and often the only piece of data you will have about a host.

IP is slowly transitioning from IPv4 to IPv6. IPv6 corrects a number of design errors in IPv4, the most notable being IP address exhaustion. An IPv4 address is a 32-bit value, conventionally written in “dotted quad” format: four bytes, written decimally, separated by periods (like 192.168.1.1). At the time of IPv4’s original design, nobody seriously expected that the 4 billion addresses provided would ever be exhausted, and many of the early allocations of IPv4 addresses are comically generous, as you can see from the master list of /8 allocations. A /8 is a collection of 16 million+ addresses (224), all of which have the same first octet, so 9.0.0.0 to 9.255.255.255 are all owned by IBM, for example. Looking at the list, you’ll see that several of the blocks were assigned large and early to companies such as Xerox and Ford, which don’t really use the space they have. The situation has actually improved over the past few years, as several drug companies that owned nearly empty /8s have returned them to IANA.

The majority of the English-speaking internet still runs on IPv4, while in Asia and elsewhere, IPv6 is increasingly prevalent. The uneven allocation of IPv4 addresses forces countries that have come to the internet historically later to build IPv6 infrastructure.

IPv4 Addresses, Their Structure, and Significant Addresses

IPv4 addresses can be expressed using a number of different notations. The most common is the dotted quad format discussed earlier: four integer values between 0 and 255, separated by periods. An address can also be referred to directly as a value, usually in hexadecimal. Consequently, the IP address 0xA1010203 is 161.1.2.3 as a dotted quad, and 2701197827 as a decimal integer.

Groups of IP addresses are usually described linearly (e.g., 128.2.11.3–128.2.3.14), or using a Classless Internet Domain Routing (CIDR) block. CIDR blocks are a mechanism for describing the addresses reachable by picking a particular route. Addresses in CIDR notation are represented by a prefix,2 which is a dotted quad representation of the significant bits of an address, and then a mask, which indicates how many bits make up the prefix.

For example, the CIDR block 128.2.11.0/24 consists of all addresses whose first 24 bits are 128.2.11, so any address from 128.2.11.0 to 128.2.11.255 is in that block.

A number of IP addresses are either reserved or fixed by convention in network configuration. For an individual host on a network, the most important are the broadcast address, gateway, and netmask:

  • IP networks are logically divided into subnets, collections of contiguous addresses that can all communicate with each other without the need for internal routing. When configuring an IP address, this range is specified using a netmask, which is an IP address with a certain number of its least significant bits zeroed out.

  • To communicate outside its subnet, a host will have to talk to a router, and does so using a preconfigured gateway address. The gateway address is simply the IP address of the router’s interface to the subnet. Gateway addresses are customarily assigned the lowest value in the subnet, but this is not a requirement.

  • A network’s broadcast address is set to the subnet mask, but with all the host bits high (e.g., for a network with subnet mask 192.168.1.0, the broadcast address is 192.168.1.255). Messages sent to the broadcast address are sent to every target within the network. The broadcast address is one of a number of addresses you should never see outside of local network traffic. Addresses ending in .255, for lack of a better term, smell funny.

A number of IPv4 addresses are reserved for specific networking functions. These addresses are specifically intended for local use and consequently should not be seen crossing networks. The most significant are:

Local identification addresses

These belong to the 0.0.0.0/8 CIDR block (0.0.0.0–0.255.255.255). Local identification addresses are used during the startup sequence for a host that doesn’t have an IP address yet.

Loopback address

The loopback address of a host is 127.0.0.1. Traffic sent to the loopback address is sent back to the host without entering the network. IANA has reserved the entire 127.0.0.0/8 CIDR block (127.0.0.0–127.255.255.255) for loopback, so as with local identification, nothing from the 127.0.0.0/8 CIDR block should be seen crossing network boundaries.3

RFC 1918 netblocks

RFC 1918 defines a number of netblocks for private use. These addresses can be used within local networks with the intent that they never communicate directly with the global internet. The RFC netblocks are 10.0.0.0/8, 192.168.0.0/16, and 172.16.0.0/12. Addresses within these blocks are often assigned automatically by local routing tools or DHCP.

Multicast addresses

Multicast addresses are used to classify specific groups of hosts within a subnet. For example, multicast address 224.0.0.2 is the “all routers” multicast address, and all routers within the subnet will receive traffic sent there. Multicast traffic is primarily the focus of routing and other internet control protocols.

IPv6 Addresses, Their Structure, and Significant Addresses

One of the most significant changes between IPv4 and IPv6 is the number of addresses they make available. IPv6 assigns 128 bits to each address; this ensures plenty of addresses, but introduces some problems in notation.

The default format for an address is eight 16-bit hexadecimal values separated by colons, such as 2001:0010:AF3A:FB31:09A8:08A1:1098:1101. Given that this is a long and clumsy representation, addresses are usually represented using a number of shorthand conventions. When writing IPv6 addresses, apply these rules:

  • Leading zeros in any group are omitted, so 01AA:0002 can be written as 1AA:2.

  • Consecutive groups of zeros may be replaced with a single pair of colons, so 2001:0:0:0:0:0:0:1 is written as 2001::1. The double-colon reduction can be used only once, so 2001:0:0:0:11:0:0:1 is written as 2001::11:0:0:1.

As with IPv4, multiple IPv6 blocks are reserved for specific functions. The most important reservation at this point is 2000::/3 (as with IPv4, CIDR block notation can be used with IPv6 addresses, and the mask can extend up to 128 bits). IPv6 space is huge, and to help keep routes reasonably close together, all routable traffic in IPv6 should be in the 2000::/3 block. IANA maintains further divisions within the 2000::/3 block, as it does with the /8 registry for IPv4. The master reference is available on the IPv6 Global Unicast Address Assignments page.

Additional address blocks of note include the ::/128 and ::1/128 blocks, which are the unspecified and loopback addresses (the equivalent of 0.0.0.0 and 127.0.0.0 for IPv4).

Of particular interest are the utility address blocks 2001:758::/29 and 2001:678::/29. 2001:758:/29 is specifically assigned to internet exchange points (IXPs); an IXP is a physical location where multiple ISPs interconnect with each other. 2001:678::/29 represents a block of provider-independent addresses; users can contact their RIRs directly for these addresses.

For clarity, a summary of local and unroutable addresses is provided in Table 10-2.

Table 10-2. Notable addresses
IPv4 block IPv6 block Description

0.0.0.0/0

::/0

Default route; addresses from this block shouldn’t be seen

0.0.0.0/32

::/128

Unspecified address

127.0.0.1/8

::1/128

Loopback

192.168.0.0/16

fc00::/7

Reserved for local traffic

10.0.0.0/8

fc00::/7

Reserved for local traffic

172.16.0.0/12

fc00::/7

Reserved for local traffic

224.0.0.0/4

ff00::/8

Multicast addresses

IP Intelligence: Geolocation and Demographics

A number of database and intelligence services provide further information about an IP address. This type of augmentation data includes ownership, geolocation, and demographic information.

It’s important to distinguish this augmentation data from information such as autonomous system, domain name, and WHOIS data. The latter is necessary for the upkeep of the network, and is maintained by internet organizations related to ICANN. Geolocation, demographic data, and ownership are intelligence products. The companies that produce them use a variety of mechanisms including network scanning as well as shoe-leather investigation. This leads to several important qualities:

  • The intelligence updates slowly, whereas DNS names can change very rapidly. Intelligence updates require calling up entities, checking public records, and other physical efforts to find out that, say, 128.2.11.214 is no longer involved in selling car parts and is now hosting malware.

  • There is always some degree of approximation. As a rule of thumb, intelligence data gets less accurate as you delve down into finer detail. Country information is usually good, but I’m moderately skeptical about city information outside of the US and Western Europe, and I never trust physical location.

  • You get what you pay for. The companies that produce this data have customers who need it. Most of the companies started out providing demographic data for large websites, and it’s still common to find limits on the number of queries you can conduct per license. You pay for accuracy and you pay for precision. There are free intelligence databases, but if you want to get finer detail than country codes, prepare to crack open your wallet.

The most commonly used open source reference is MaxMind’s GeoIP, which provides a number of databases for city, country, region, organization, ISP, and network speed. MaxMind also provides free services in the form of “lite” databases for identifying the city and country associated with an IP address. All of its products are downloadable databases and are updated regularly. MaxMind has been providing this service for years, along with a number of APIs in Python and other scripting languages that are available to access the database.

Applied Security has produced a good GeoIP library in Python (pygeoip, also available in pip). pygeoip works with MaxMind’s commercial and free database instances. The following sample script, pygeoip_lookup.py, shows how the API works:

#!/usr/bin/env python
#
# pygeoip_lookup.py
#
# Takes any IP addresses passed to it as input,
# runs them through the MaxMind GeoIP database, and
# returns the country code.
#
# Command-line arguments:
# argv[1]: Filename for a GeoIP database from MaxMind.

include sys,string,pygeoip

gi_handle = None
try:
    geoip_dbfn = sys.argv[1]
    gi_handle = pygeoip.GeoIP(geoip_dbfn,pygeoip.MEMORY_CACHE)
except:
    sys.stderr.write("Specify a database
")
    sys.exit(-1)

for i in sys.stdin.readlines():
    ip = i[:-1]
    cc = gi_handle.country_code_by_addr(ip)
    print "%s %s" % (ip, cc)

For more extensive information, options include Neustar and Digital Envoy’s Digital Element. Both provide more precise measurement, as well as additional demographic data such as MSA (metropolitan statistical areas, contiguous areas of high population density used by the US government for statistical analysis) and NAICS (North American Industry Classification System, a numerical identifier akin to a Dewey Decimal number for business type) codes. These services are not cheap, however.

DNS

In a just world, each IP address would have a single DNS name, and finding the DNS name associated with an IP address would be a simple matter of consulting a database. This world is not just.

DNS is the glue that makes the internet usable by human beings. As one of the older services making the internet work, DNS overlaps with a couple of other services (particularly mail). The Domain Name System is, at this point, a distributed database that provides lookup information for a number of different relationships: DNS name to IP address, DNS name to DNS name, email address to mail server, and so on.

DNS Name Structure

A domain name consists of a hierarchical sequence of labels separated by periods, such as www.oreilly.com. Domain names become more general as you read from left to right, ending at the root domain (the root domain is ., but it’s almost always implicit). Domain names do have limits. The total length of a name cannot exceed 253 characters, and individual labels are limited to 63. Finally, domain names are limited to 127 distinct labels, although the character limit should affect that far earlier.

Historically, labels were limited to a restricted subset of ASCII characters. Since 2009, it has been possible to acquire internationalized domain names, which are encoded using character systems such as Chinese, Greek, and so on.4 The mechanical limits of 253 characters per name still hold, though the encoding is more complex, as discussed in Chapter 12.

Forward DNS Querying Using dig

The basic DNS query tool is the domain information groper (dig), a command-line DNS client that enables you to query DNS servers for all of the major records. We’ll begin by conducting a simple dig query:

$ dig oreilly.com
dig oreilly.com

; <<>> DiG 9.8.3-P1 <<>> oreilly.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 29081
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;oreilly.com.			IN	A

;; ANSWER SECTION:
oreilly.com.		383	IN	A	208.201.239.101
oreilly.com.		383	IN	A	208.201.239.100

;; Query time: 10 msec
;; SERVER: 192.168.1.1#53(192.168.1.1)
;; WHEN: Sat Jul 20 19:11:17 2013
;; MSG SIZE  rcvd: 61
$ dig +short oreilly.com
208.201.239.101
208.201.239.100

Let’s examine dig’s display options, and then the structure of the DNS response. As seen in the previous example, the basic dig command provides extensive information about the query, beginning with a list of options invoked, then a DNS header, and then several sections corresponding to the query. Note the QUERY, ANSWER, AUTHORITY, and ADDITIONAL fields in the header line, and how those correspond to the lines in the corresponding sections. Because this domain returned no AUTHORITY or ADDITIONAL records, none are shown in the output. The query is followed by a set of statistics about the query: the server, the time it took, and the size of the message.

dig provides an enormous number of output options; the previous example showed the default display. Individual sections of that display can be turned off using +nocomments (which kills all the comments beginning with a double semicolon), +nostats (killing the statistics at the end), and +noquestion and +noanswer (to eliminate the DNS responses). +short, as illustrated at the end of the previous example, will simply remove all the cruft and show the responses.

dig is a DNS client, so the majority of information seen is from the DNS server itself. dig enables you to query different servers by using @ in the command line. For example:

# 8.8.8.8 is Google's public DNS server; let's query a content
# distribution network using it
$ dig @8.8.8.8 www.foxnews.com
; <<>> DiG 9.8.3-P1 <<>> @8.8.8.8 www.foxnews.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 18702
;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;www.foxnews.com.		IN	A

;; ANSWER SECTION:
www.foxnews.com.	282	IN	CNAME	www.foxnews.com.edgesuite.net.
www.foxnews.com.edgesuite.net. 21582 IN	CNAME	a20.g.akamai.net.
a20.g.akamai.net.	2	IN	A	204.245.190.42
a20.g.akamai.net.	2	IN	A	204.245.190.8

;; Query time: 141 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Sat Jul 20 19:48:01 2013
;; MSG SIZE  rcvd: 135

# Query using my default server
$ dig www.foxnews.com

; <<>> DiG 9.8.3-P1 <<>> www.foxnews.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 47098
;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;www.foxnews.com.		IN	A

;; ANSWER SECTION:
www.foxnews.com.	189	IN	CNAME	www.foxnews.com.edgesuite.net.
www.foxnews.com.edgesuite.net. 9699 IN	CNAME	a20.g.akamai.net.
a20.g.akamai.net.	9	IN	A	23.66.230.160
a20.g.akamai.net.	9	IN	A	23.66.230.106

;; Query time: 97 msec
;; SERVER: 192.168.1.1#53(192.168.1.1)
;; WHEN: Sat Jul 20 19:48:09 2013
;; MSG SIZE  rcvd: 135

As you can see, querying a CDN-moderated site (Fox News uses Akamai) results in radically different IP addresses for the same name. CDNs manipulate the DNS to ensure that caches of published data are geographically close to their target. If you don’t specify the server using @, dig will default to whatever server the system is configured to use (for example, in Unix systems this is maintained in /etc/resolv.conf).

A CDN is a caching network that makes the internet viable. Before the web, a user might visit four to five hosts in an hour; after the web, a request to a web page might launch a hundred different HTTP requests. The majority of these requests are redirected via DNS to caching servers that are located geographically nearby. CDNs add an annoying wrinkle to web analysis, because a single CDN server may host multiple websites—if a host is identified as part of a CDN, the only organization that can tell you what’s on that host is the CDN provider.

Now, let’s look at the DNS data. DNS is a federated database system, so queries go first to a local DNS server, which sends a response if it possesses the answer to the query. If the server doesn’t have the information, it uses the hierarchical structure of the name to figure out where to send the request, waits for a response, and sends the response back. DNS supports a number of different queries, termed resource records (RRs), and the options sent as part of the query specify the resource record requested as well as options for querying additional servers. The values with As or CNAMEs in the preceding response are resource records.

Note that the DNS header lists eight fields:

opcode

This field was intended to specify a number of different actions, such as queries, inverse queries, and server status. In practice, it should always be set to QUERY. A number of other opcodes exist, but they are used to communicate information between servers.

status

The status of the response. Three messages appear most often: NOERROR indicates that the query was successful, NXDOMAIN indicates that no domain was available, and SERVFAIL indicates that authoritative servers for the domain were unreachable.

id

The message ID. DNS is a UDP-moderated protocol and uses message IDs to track queries and responses.

flags

These provide information on the response; they include qr (set high for a response), aa (set high when the answer is from an authoritative server), ra (recursion desired), and rd (recursion available).

QUERY

This field indicates that the record is simply a copy of the original request; you can see in this case that the query is echoed in what dig refers to as the QUESTION section.

ANSWER

Contains the response.

AUTHORITY

Reserved for records that identify other servers.

ADDITIONAL

Provides additional information, such as the expected responses to future queries.

Additional information is very much a function of the nameserver’s administrators. A common example of its use follows, where the information provides a name lookup for the mail server identified by an MX query:

$ dig +nostats +nocmd mx cmu.edu
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 30852
;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 3

;; QUESTION SECTION:
;cmu.edu.			IN	MX

;; ANSWER SECTION:
cmu.edu.		20051	IN	MX	10 CMU-MX-02.ANDREW.cmu.edu.
cmu.edu.		20051	IN	MX	10 CMU-MX-03.ANDREW.cmu.edu.
cmu.edu.		20051	IN	MX	10 CMU-MX-04.ANDREW.cmu.edu.
cmu.edu.		20051	IN	MX	10 CMU-MX-01.ANDREW.cmu.edu.

;; ADDITIONAL SECTION:
CMU-MX-03.ANDREW.cmu.edu. 20412	IN	A	128.2.155.68
CMU-MX-01.ANDREW.cmu.edu. 20232	IN	A	128.2.11.59
CMU-MX-02.ANDREW.cmu.edu. 20051	IN	A	128.2.11.60

Now, let’s discuss what those resource records actually mean. DNS has upward of 20 resource records for different functions. The major ones are:

A

An answer record, providing the IP address associated with a particular name.

AAAA

Like A, but provides an IPv6 address for a name.

CNAME

Relates two names, a canonical name and an alias.

MX

Returns the mail server for a domain.

PTR

Points to a canonical name; mostly used for DNS reverse lookups.

TXT

Contains arbitrary text data.

NS

Describes the nameserver for an address.

SOA

Provides information about the authoritative nameserver for an address.

dig starts all resource records with the same four values: a name, a time to live (TTL), a class, and an identifier for the RR (for example: cmu.edu, 20051, IN, MX). The name is passed with the query. The TTL indicates for how long (in seconds) the value of the name can be trusted; DNS relies heavily on caching and the TTL provides instructions on when to refresh the cache. The class will almost invariably be IN (internet); other class names are possible, but outside the scope of this book.

A and AAAA (address) provide basic DNS functionality: they associate the queried name with an IP address. A records provide IPv4 addresses, and AAAA records provide IPv6 addresses. By default, dig queries for A records, while other record types are specified by adding them to the command line, as seen here:

$ dig +nocomment +noquestion +nostats +nocmd www.google.com
www.google.com.		55	IN	A	74.125.228.81
www.google.com.		55	IN	A	74.125.228.83
www.google.com.		55	IN	A	74.125.228.84
www.google.com.		55	IN	A	74.125.228.80
www.google.com.		55	IN	A	74.125.228.82
$ dig +nocomment +noquestion +nostats +nocmd aaaa www.google.com
www.google.com.		18	IN	AAAA	2607:f8b0:4004:802::1014

Note that the query to Google responds with five A records. This is an example of round robin DNS allocation, a common load balancing technique. In round robin allocation, the same domain name is assigned to multiple IP addresses. Consequently, when a query chooses an IP address to contact for the name, it effectively picks the name randomly from the set of targets. Round robin DNS allocation is one of many DNS hacks that make reverse lookups (IP addresses from names) incredibly annoying.

Note also the short TTL values in the response. If a particular Google server goes down, the TTL guarantees that in 55 seconds, the user has good odds of contacting another server.

Canonical name (CNAME) records are used to associate an alias to a canonical name. For example, consider lookups for www.oreilly.com:

$ dig +nocomment +noquestion +nostats +nocmd www.oreilly.com
www.oreilly.com.	3563	IN	CNAME	oreilly.com.
oreilly.com.		506	IN	A	208.201.239.101
oreilly.com.		506	IN	A	208.201.239.100

As this shows, the name www.oreilly.com actually points to oreilly.com. www.oreilly.com does not have an IP address; it points to oreilly.com, and that name has an IP address. Canonical names are used for shortcuts (as in the previous example), and also to manage content distribution. The example using Fox News showed how Akamai first aliases all of Fox News’s sites into its own network names using CNAME.

DNS provides lookup functions for email through the agency of the mail exchange (MX) record. MX records record the addresses of mail servers for a particular domain. For example, if I want to send mail to [email protected], I can find the mail server for doing so by looking up the MX records for cmu.edu:

$ dig  +noquestion +nostats +nocmd mx cmu.edu
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 49880
;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 2

;; ANSWER SECTION:
cmu.edu.		21560	IN	MX	10 CMU-MX-03.ANDREW.cmu.edu.
cmu.edu.		21560	IN	MX	10 CMU-MX-04.ANDREW.cmu.edu.
cmu.edu.		21560	IN	MX	10 CMU-MX-01.ANDREW.cmu.edu.
cmu.edu.		21560	IN	MX	10 CMU-MX-02.ANDREW.cmu.edu.

;; ADDITIONAL SECTION:
CMU-MX-01.ANDREW.cmu.edu. 21519	IN	A	128.2.11.59
CMU-MX-02.ANDREW.cmu.edu. 21159	IN	A	128.2.11.60

MX records include a server name (such as CMU-MX-03.ANDREW.cmu.edu), as well as a priority value for the email server. The weighting value is used to choose a mail server: mail clients should pick mail servers in order of ascending priority (i.e., 1 should be chosen before 10).

Of note in this example are the A records shoved into the additional section. These records resolve the CMU-MX-01 and CMU-MX-02 addresses. This reflects a conscious decision by CMU’s DNS administrators to include this information and reduce the number of lookups done.

Nameserver (NS) records are used to find the authoritative nameserver for a zone. For example, for O’Reilly Media:

$ dig +nostat ns oreilly.com

; <<>> DiG 9.8.3-P1 <<>> +nostat ns oreilly.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 32310
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;oreilly.com.			IN	NS

;; ANSWER SECTION:
oreilly.com.		3600	IN	NS	nsautha.oreilly.com.
oreilly.com.		3600	IN	NS	nsauthb.oreilly.com.

Now look at the NS record for a site managed by a CDN, such as Fox News again:

$ dig +nostat ns foxnews.com

; <<>> DiG 9.8.3-P1 <<>> +nostat ns foxnews.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 38538
;; flags: qr rd ra; QUERY: 1, ANSWER: 8, AUTHORITY: 0, ADDITIONAL: 5

;; QUESTION SECTION:
;foxnews.com.			IN	NS

;; ANSWER SECTION:
foxnews.com.		300	IN	NS	usc2.akam.net.
foxnews.com.		300	IN	NS	ns1.chi.foxnews.com.
foxnews.com.		300	IN	NS	ns1-253.akam.net.
foxnews.com.		300	IN	NS	dns.tpa.foxnews.com.
foxnews.com.		300	IN	NS	usw1.akam.net.
foxnews.com.		300	IN	NS	usw3.akam.net.
foxnews.com.		300	IN	NS	asia3.akam.net.
foxnews.com.		300	IN	NS	usc4.akam.net.

;; ADDITIONAL SECTION:
usw1.akam.net.		28264	IN	A	96.17.144.195
usw3.akam.net.		50954	IN	A	69.31.59.199
asia3.akam.net.		28264	IN	A	222.122.64.134
usc4.akam.net.		28264	IN	A	96.6.112.196
usc2.akam.net.		88188	IN	A	69.31.59.199

Note that in this case, the authoritative nameservers are largely owned by akam.net (Akamai). Fox News is hosted by Akamai’s CDN, and Akamai modifies the names of the hosts as necessary in order to boost performance.

Start of Authority (SOA) records contain summary information about the authoritative server for a domain. These records are most commonly encountered during failed lookups. When an address isn’t found, the SOA information for that zone’s server is returned instead:

$ dig @8.8.4.4 +multiline +nostat zlkoriongomk.com

; <<>> DiG 9.8.3-P1 <<>> @8.8.4.4 +multiline +nostat zlkoriongomk.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 11857
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;zlkoriongomk.com.	IN A

;; AUTHORITY SECTION:
com.			899 IN SOA a.gtld-servers.net. nstld.verisign-grs.com. (
				1374373035 ; serial
				1800       ; refresh (30 minutes)
				900        ; retry (15 minutes)
				604800     ; expire (1 week)
				86400      ; minimum (1 day)
				)

The SOA field begins with the source host, followed by a contact email address (note that the email address uses a dot rather than an at-sign as a separator). After this address comes a serial number, which indicates how many times the source file has been modified, and then timeout statistics. Note the +multiline option for dig; this will provide a multiple-line, more human-readable output for the SOA record.

The TXT field is a wildcard field used for any text output that the server administrator feels like passing. For example, Google passes strings for managing Google Apps:

$ dig +short txt google.com
"v=spf1 include:_spf.google.com ip4:216.73.93.70/31 ip4:216.73.93.72/31 ~all"

The DNS Reverse Lookup

A reverse lookup is the process of reconstructing a DNS name from an IP address. For example, if I want to find out who owns 208.201.139.101, I do so using dig -x:

$ dig +nostat -x 208.201.139.101

; <<>> DiG 9.8.3-P1 <<>> +nostat -x 208.201.139.101
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 7519
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;101.139.201.208.in-addr.arpa.	IN	PTR

;; ANSWER SECTION:
101.139.201.208.in-addr.arpa. 21600 IN	PTR	host-d101.studley.com.

Reverse lookups are requests to get DNS names from IP addresses. Note that the question section does not request the IP address, 208.201.139.101, but 101.139.201.208.in-addr.arpa, which lists the fields of the IP address in reverse order. When DNS does a reverse lookup, it creates a special domain name to query in the in-addr.arpa TLD.5 The string of digits and periods used for a reverse lookup is the original IP address reversed. This is because DNS names and IP addresses are defined in a contradictory fashion. A DNS name becomes more finely defined (from TLD to domain to individual host) by reading from right to left, while IP addresses are more finely defined reading from left to right.

Reverse lookups are a kludge. Note that the record returned in the answer is a pointer (PTR) record. PTR records are not automatically created with the canonical A records, but are instead registered separately by the NIC. More important, there’s no requirement that a PTR record be registered, and the relationships between names and IP addresses are tenuous at best.

For example, consider a CDN. If I look up one of Fox News’s IP addresses, such as 23.66.230.66, I get this:

$ dig +nostat +nocmd -x 23.66.230.66
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 56379
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;66.230.66.23.in-addr.arpa.	IN	PTR

;; ANSWER SECTION:
66.230.66.23.in-addr.arpa. 290	IN
PTR	a23-66-230-66.deploy.static.akamaitechnologies.com.

The CDN becomes an informational dead end; the answer from the reverse lookup has no meaningful relation to the names in the original query.

In general, DNS information is best collected at the time of the original query. The uncertainty of reverse lookups is part of the reason for this. However, even if reverse lookups worked perfectly, attackers often use very short-lived names. Where possible, record domain names as they’re used (such as the URL in HTTP logs) rather than trying to reconstruct them after the fact.

Using whois to Find Ownership

While DNS can provide information on a domain’s name, the meat of ownership information is provided by WHOIS. This is a federated protocol (defined in RFC 3912) that lists the putative owners of DNS names. The standard whois query on a domain will return ownership and contact information for a domain, as seen in Example 10-1.

Example 10-1. A whois query for oreilly.com
$ whois oreilly.com

<boilerplate>

   Domain Name: OREILLY.COM
   Registrar: GODADDY.COM, LLC
   Whois Server: whois.godaddy.com
   Referral URL: http://registrar.godaddy.com
   Name Server: NSAUTHA.OREILLY.COM
   Name Server: NSAUTHB.OREILLY.COM
   Status: clientDeleteProhibited
   Status: clientRenewProhibited
   Status: clientTransferProhibited
   Status: clientUpdateProhibited
   Updated Date: 26-may-2012
   Creation Date: 27-may-1997
   Expiration Date: 26-may-2013

<more boilerplate>

   Registered through: GoDaddy.com, LLC (http://www.godaddy.com)
   Domain Name: OREILLY.COM
      Created on: 26-May-97
      Expires on: 25-May-13
      Last Updated on: 26-May-12

   Registrant:
   O'Reilly Media, Inc.
   1005 Gravenstein Highway North
   Sebastopol, California 95472
   United States

   Administrative Contact:
      Contact, Admin  [email protected]
      O'Reilly Media, Inc.
      1005 Gravenstein Highway North
      Sebastopol, California 95472
      United States
      +1.7078277000      Fax -- +1.7078290104

   Technical Contact:
      Contact, Tech  [email protected]
      O'Reilly Media, Inc.
      1005 Gravenstein Highway North
      Sebastopol, California 95472
      United States
      +1.7078277000      Fax -- +1.7078290104

   Domain servers in listed order:
      NSAUTHA.OREILLY.COM
      NSAUTHB.OREILLY.COM

You’ll note that a WHOIS entry for a domain returns an enormous amount of boilerplate information. You will also find that the information returned has no particular fixed format—WHOIS information is the electronic equivalent of 3×5 index cards. Depending on who owns the card and how they decide to administer it, you may get phone numbers and biographies, or nothing at all.

A good way to get a feel for the differences in registration is to take a look at the registration files for different countries. There is no central WHOIS database—instead, depending on the top-level domain, WHOIS information may be maintained by any of a number of WHOIS servers. For example, Russian WHOIS data (the .ru domain) is maintained by whois.ripn.net, French by lvs-vip.nic.fr, and Brazilian by registro.br. Fortunately, the good folks at whois-servers.net provide aliases for every country and TLD, and depending on your whois implementation, the information may be baked into the executable for you already.

At the minimum, any whois implementation will provide the ability to specify a lookup server using the -h switch. So, whois -h ru.ripn.net will query that server directly. Several whois implementations offer a country-specific -c option, making whois -c RU identical to querying whois.ripn.net.

In addition to providing information on domain names, whois is also useful for providing information on address allocation and ownership. If whois is called with an IP address rather than a name, like in Example 10-2, it will provide information on the organization that owns that address, often in the form of a netblock. For example, if I look up the whois information for Voila, a French search engine, I get different information based on whether I look at RIPE (the European top-level registry) or the French NIC. RIPE is informative; the French NIC is considerably less so.

Example 10-2. Using whois with an IP address
$ dig +short voila.fr
193.252.148.80

$ whois -h whois.ripe.net 193.252.148.80
% This is the RIPE Database query service.
% The objects are in RPSL format.
%
% The RIPE Database is subject to Terms and Conditions.
% See http://www.ripe.net/db/support/db-terms-conditions.pdf

% Note: this output has been filtered.
%       To receive output for a database update, use the "-B" flag.

% Information related to '193.252.148.0 - 193.252.148.255'

% Abuse contact for '193.252.148.0 - 193.252.148.255' is '[email protected]'

inetnum:        193.252.148.0 - 193.252.148.255
netname:        ORANGE-PORTAILS
descr:          France Telecom
descr:          internet portals for multiple services
country:        FR
admin-c:        WPTR1-RIPE
tech-c:         WPTR1-RIPE
status:         ASSIGNED PA
remarks:        for hacking, spamming or security problems send mail to
remarks:        [email protected]
mnt-by:         FT-BRX
source:         RIPE # Filtered

role:           Wanadoo Portails Technical Role
address:        France Telecom - OPF/Portail/DOP/Hebex
address:        48, rue Camille Desmoulins
address:        92791 Issy Les Moulineaux Cedex 9
address:        FR
phone:          +33 1 5888 6500
fax-no:         +33 1 5888 6680
admin-c:        WPTR1-RIPE
tech-c:         WPTR1-RIPE
nic-hdl:        WPTR1-RIPE
mnt-by:         FT-BRX
source:         RIPE # Filtered

% This query was served by the RIPE Database Query Service version 1.60.2 (WHOIS4)

$ whois -h fr.whois-servers.net 195.152.120.129
%%
%% This is the AFNIC Whois server.
%%
%% complete date format : DD/MM/YYYY
%% short date format    : DD/MM
%% version              : FRNIC-2.5
%%
%% Rights restricted by copyright.
%% See http://www.afnic.fr/afnic/web/mentions-legales-whois_en
%%
%% Use '-h' option to obtain more information about this service.
%%
%% [96.255.98.126 REQUEST] >> 195.152.120.129
%%
%% RL Net [##########] - RL IP [#########.]

You will find that the situation is reversed with Asian information. The APNIC WHOIS is often fairly sparse, but the WHOIS entries at the country level are usually informative.

WHOIS information is particularly useful when you can’t get much useful data out of a DNS reverse lookup. If you can’t find the specific domain name, you can use whois to at least find the block of addresses that host the domain.

DNS Blackhole Lists

Reputation information such as DNS blackhole lists (DNSBLs) are generated by a number of organizations as a form of threat intelligence. A DNSBL is a DNS-based IP address database used primarily as an antispam technique. The first DNSBLs were actually implemented using the Border Gateway Protocol (BGP, see Chapter 19 for more information) and were intended to actively drop routes associated with spammer IP addresses. Modern DNSBLs are instead DNS-moderated, and serve as reputation databases for email software. For example, a mail transfer agent can consult a DNSBL to determine if the sending IP is a spammer and react accordingly.

DNSBLs work by providing a reverse lookup–style functionality on their DNS servers. For example, I can look up an echo address on a DNSBL using dig:

$ dig 2.0.0.127.sbl.spamhaus.org

; <<>> DiG 9.8.3-P1 <<>> 2.0.0.127.sbl.spamhaus.org
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 45434
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;2.0.0.127.sbl.spamhaus.org.	IN	A

;; ANSWER SECTION:
2.0.0.127.sbl.spamhaus.org. 300	IN	A	127.0.0.2

;; Query time: 39 msec
;; SERVER: 192.168.1.1#53(192.168.1.1)
;; WHEN: Sun Jul 28 15:10:23 2013
;; MSG SIZE  rcvd: 60

The address I intended to query was 127.0.0.2. Note that, as with a reverse lookup, I reversed the IP address. After reversing the address, I attached it to the name of the list and query. This process is effectively a reverse lookup without relying on the hardcoded .arpa TLD. Instead, the response is provided by an A record provided by Spamhaus’s SBL server.

DNSBLs differ depending on the list and provider. Providers may provide several different forms of lists for different categories of traffic. Different providers will also provide different policies for adding or removing addresses to or from the DNSBL. How different organizations handle delisting (address removal) radically impacts the character of the list. Most automatically drop an address a fixed number of days after the last abuse; others require manual intervention.

Some notable DNSBLs include:

Spam and Open Relay Blocking System (SORBS)

SORBS provides over 15 different DNSBLs that categorize hosts into a number of different behaviors. It’s particularly useful for categorizing dynamic addresses such as dialup and DSL addresses through a specialized list, the Dynamic User and Host List (DUHL).

Spamhaus

A nonprofit private company that produces a number of distinct blacklists and whitelists, Spamhaus’s most commonly used lists are the Policy Block List (PBL, for end-user addresses), Spamhaus Block List (SBL, for known spam addresses), and Exploits Block List (XBL, for hijacked IP addresses and bots). These lists are accessible as a single combined service, ZEN.

SpamCop

Currently owned by Cisco Systems, SpamCop began as a private effort and eventually became part of IronPort’s email reputation system. Currently, SpamCop provides one public list, the SpamCop Block List (SCBL).

DNSBLs are useful for identifying hostile activity. Using a DNSBL, an analyst can determine whether a particular address has been doing something hostile elsewhere on the internet and possibly what kind of activity it was. They supplement the more basic lookup information discussed earlier by providing some idea of a site’s past history.

DNSBLs are designed to be real-time tools that work primarily with mail agents, not to support forensic analysis. Records will change quickly and unpredictably, so an address may be recognized by the DNSBL as hostile at the time of an event, but be delisted when an analyst examines it later. Most of the blacklists sell some kind of feed or data dump that, for forensic purposes, is preferable.

Search Engines

Never underestimate the value of just Googling something. A good hunk of internet traffic consists of people mapping out said traffic, and it’s obviously of value for the rest of us to take advantage of it. Search engines, whether universal ones such as Google or specialized ones such as Shodan, can provide you with additional contextual information about an IP address.

General Search Engines

The two most useful general-purpose search engines are Google and the Internet Archive. In the case of Google, it’s omnipresent, there are a number of powerful search predicates, and you have access to cached sites. If you’re actively engaged in work outside of the English language/Roman alphabet, then it helps to be familiar with international search engines such as Naver (Korea), Yandex (Russia), or Baidu (China), as well as the various language-specific Googles.

Regardless of the search engine, you want to identify predicates that will help you refine the search to find specific sites or technical terms. For example, in Google, you can literalize a search by using quotes (e.g., Googling “the google” returns the exact phrase “the google” from the Google). Other predicates of note include site: (which will search only for a specific domain name—handy for identifying subdomains of the same domain), inurl: (which looks for a string in a URL), and cache: (which returns the latest version of a URL from Google’s cache, avoiding directly contacting the site).

After Google, the Internet Archive is useful when you’re looking for context or the history of a website. If a particular domain name appears that you’ve never seen before, it’s useful to check the archive for a history of that domain. If a site changes radically, the Internet Archive has a reasonable chance of maintaining a pre-change version. The Alrwais paper mentioned at the end of this chapter is a good example of using the Internet Archive to track changes.

Scanning Repositories, Shodan et al

Not all scanners are malicious. A number of threat intelligence groups scan the Internet on a regular basis, providing information on vulnerabilities. Notable repositories include:

Censys

Censys is a scanning team that provides a search engine hosted at the University of Michigan. Censys regularly scans the entire IPv4 address space, Alexa’s million busiest hosts, and other hosts for a constantly updated list of vulnerabilities.

Shodan

Shodan is the oldest internet-wide vulnerability scanning system. Currently, it markets itself as the search engine for IoT, but historically it has scanned for everything.

Both of these engines scan for a specific set of vulnerabilities, and are pretty good about listing what they look for—be aware that, particularly as a defender, their primary value is in scanning you rather than looking at what other sites host.

Further Reading

  1. S. Alrwais et al., “Catching Predators at Watering Holes: Finding and Understanding Strategically Compromised Websites,” Proceedings of the 2016 Annual Computer Security Applications Conference (ACSAC), Los Angeles, CA, 2016.

  2. J. Long, B. Gardner, and J. Brown, Google Hacking for Penetration Testers, 3rd ed. (Rockland, MA: Syngress Publishing, 2015).

  3. Bishop Fox Google Hacking Diggity Project.

  4. ICANNWiki.

1 And I prefer to keep my Windows and Linux boxes physically wired.

2 Note that the prefix is the equivalent to a subnet’s netmask.

3 That doesn’t mean you won’t see it, just that you shouldn’t, and if you do, you should figure out a way to stop it. The internet is weird.

4 Internationalized domain names raise the risk of homoglyphic attacks, such as creating a domain name that looks like oreilly.com but uses a Cyrillic O; see Chapter 12 for more information on this.

5 .arpa officially stands for Address and Routing Parameter Area. This name is a backronym, because the abbreviation initially meant Advanced Research Projects Agency, the DoD agency that originally funded internet development.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.219.35.128