Chapter 2. Names and Numbers

Hostnames, and the numeric addresses they correspond to, are the way to identify computers on the Internet. Understanding how these names and numbers are managed is therefore a fundamental aspect of Internet forensics. This chapter describes the types of information you can obtain from public databases of Internet addresses and discusses three essential tools that can help you identify machines and the people behind them. I’ll start with a short review of how computers are identified on the Internet.

Addresses on the Internet

Each computer on the Internet has a unique identifier in the form of its Internet Protocol (IP) address. This is a 32-bit integer, which we normally represent as four 8-bit integers separated by periods, such as 208.12.16.5.

Numeric addresses are fine for systems administrators who need to set up networks and who like that sort of thing. But for most people, they are impossible to remember and so we have real names for computers, the hostnames that we are all familiar with, such as http://www.oreilly.com.

The translation between hostnames and IP addresses is handled by the Domain Name System (DNS). For example, when you type a hostname into a browser as part of a URL, the browser converts the name into the corresponding IP address and then uses that to communicate with the web server. The browser queries a DNS server on the network, which looks up the name in its database and returns the numeric address to the browser.

In its simplest form, a DNS server consists of two tables of data and the software necessary to interrogate them. The first table is a list of hostnames and the IP addresses to which they correspond. The second is a list of IP addresses and the hostnames to which they map. Storing the addresses of all computers on the Internet on every server is not practical, so DNS distributes the data across many thousands of servers around the world. If a DNS server receives a query for a hostname that it does not carry data for, it forwards the query to other servers until it finds one that can answer the request. Certain servers are authoritative for particular domains, meaning that they are the ultimate reference for mappings between certain sets of names and numbers. What goes on behind the scenes of DNS can become very complex, especially where the networks of large companies are involved.

Tip

I can only scratch the surface of the topic here, but for more information you might consider the books DNS and BIND by Paul Albitz and Cricket Liu and DNS and Bind Cookbook by Cricket Liu, both published by O’Reilly.

IP Addresses

To ensure that computers are uniquely identified, the IP addresses need to be carefully assigned to groups and individuals. This is done in a hierarchical manner across the entire Internet. At the highest level, the Internet Assigned Numbers Authority (IANA) assigns large blocks of addresses to Regional Internet Registries (RIRs). There are four RIRs at present that together cover the entire world. Each of these assigns sub-blocks of addresses to national registries, large network operators, and Internet Service Providers (ISPs). They assign yet smaller address blocks to smaller ISPs, and ultimately your ISP assigns a small address block for your business or a single address for your personal computer.

You can think of these assignments as starting with the high order bits of the 32-bit address and working down. For example, IANA assigned the block 208.0.0.0 through 208.255.255.255, among others, to the RIR responsible for North America. They in turn allocated 208.0.0.0 through 208.35.255.255 to Sprint, one of the large network operators. Sprint assigned 208.12.0.0 through 208.12.31.255 to Seanet, a regional ISP in Seattle, and they in turn assigned 208.12.16.0 through 208.12.16.7 to me.

The usual representation of an IP address—for example, 208.12.16.5—is called dotted-quad , dotted-octet , or dotted-decimal , depending on where you look. I’ll use the first of these throughout the book. Sometimes it is useful to think of them as 32-bit binary words and occasionally as single integers. We’ll also encounter a related notation for blocks of IP addresses. 208.12.16.x, for example, is shorthand for the block of 256 addresses that start with 208.12.16.0. A more flexible notation looks like this: 208.12.16.0/29. This has an IP address that marks the start of the block followed by a slash and a number called the prefix-length. This is the number of bits, starting at the high end, that have are predefined in this block. The number of low order bits that are available for allocation is 32 minus this number. So in this example there are 3 bits available, which means this subnet has 8 addresses.

Databases of IP address blocks

One of the fundamental tasks you will face is figuring out where in the world a particular server is located. An easy way to do this is to look at the IP address. As I have described, large blocks of addresses are assigned to the four RIRs around the world. Their areas of responsibility are as follows:

American Registry for Internet Numbers

ARIN (http://www.arin.net) is responsible for North America, part of the Caribbean, and Sub-Equatorial Africa.

Asia Pacific Network Information Centre

APNIC (http://www.apnic.net) is responsible for countries in Asia and the Pacific Rim, including China, Korea, India, Japan, and Australia.

RIPE Network Coordination Center

RIPE NCC (http://www.ripe.net) covers Europe, the Middle East, Northern Africa, and parts of Asia. RIPE stands for Réseaux IP Européens, which translates into European IP Resources.

Latin American and Caribbean IP Address Regional Registry

LACNIC (http://www.lacnic.net) has responsibility for Latin America and the Caribbean.

The list of top-level assignments of IP addresses can be found here:

http://www.iana.org/assignments/ipv4-address-space

By top-level, I mean the address blocks defined by the leftmost integer in a dotted quad IP address, each of which contains 16,777,216 (256 × 256 × 256) addresses. The list makes interesting reading. Starting in September 1981, many of the initial assignments were to large U.S. corporations such as Ford Motor Company (019.x.x.x) and IBM (009.x.x.x). The RIRs were a later development in the history of the Internet, but once established, they were assigned discrete address blocks. The entire list is too large to include, but here are the main blocks that are directly assigned to each RIR:

ARIN (North America, Southern Africa)
063.x.x.x–072.x.x.x
199.x.x.x
204.x.x.x–209.x.x.x
216.x.x.x
APNIC (Asia, Australasia)
058.x.x.x–061.x.x.x
202.x.x.x–203.x.x.x
210.x.x.x–211.x.x.x
218.x.x.x–222.x.x.x
RIPE NCC (Europe, Middle East, Northern Africa)
062.x.x.x
081.x.x.x–088.x.x.x
193.x.x.x–195.x.x.x
212.x.x.x–213.x.x.x
217.x.x.x
LACNIC (South America)
200.x.x.x–201.x.x.x

You can use this as a quick reference to see that, for example, 208.12.16.5 falls under the control of ARIN and so is likely to be in North America or Southern Africa. Not very specific, I’ll admit, but it can come in quite handy.

Domain Names

The IP address system is clean, elegant, and works very well. But things are less tidy when we look at hostnames and domains. Nobody assigns me the domain http://craic.com or tells me what hostnames to give my servers. Instead I get to think up a clever domain name, register it so that no one else can use it, and then pick arbitrary names for the computers that reside under that domain name. There is, however, some control over domains.

The Internet Corporation on Assigned Names and Numbers (ICANN ) is the body responsible for assigning the top-level domains, such as .com, .org, and .biz, and for controlling the domain name registries. They are also responsible for the IANA, which I discussed in the previous section. Importantly, ICANN is the arbiter of disputes concerning domain names , usually involving trademark infringement.

ICANN gives its blessing to a large number of domain name registrars around the world, allowing them to accept requests from you and me to register our domain names. Those registrars maintain databases of contact information for domain owners. Many of the smaller registrars use the services of the larger companies to manage their records, effectively acting as retailers in a relationship with a wholesaler. These are the records that you will query when you want to learn who is responsible for a particular web site.

The specific information these registrars make available to the public includes the domain name itself, contact information, the date the domain was created, when it will expire, and when it was last updated. They also include the names of the DNS servers that are authoritative for each domain. But registrars do not tell us anything about the actual hostnames that exist within each domain. That is handled by DNS and, although many registrars also provide that service, it is a completely separate system. It is usually most efficient if your ISP manages your DNS records, as they are responsible for actually assigning the IP addresses.

The contact information for the owners of each domain is potentially the most useful piece of information. Unfortunately, when it comes to those that are involved in Internet scams, we can be pretty confident that their information is bogus. Some domain registrars make an attempt to verify the data, but with most, the effort is half-hearted at best. This lack of verification is a major reason why seemingly blatant fraud can flourish on the Net.

Identifying domain owners has become even more difficult of late due to new privacy services that registrars will provide for an additional fee. These services are intended to protect your privacy and prevent your information from being harvested by spammers. Your postal address, for example, will be replaced by a post office box that is managed by the registrar. They know your real address and will forward only certain types of documents, discarding any junk mail. Similarly, your contact email is replaced with an address at the registrar, which changes periodically. Any mail to that address is filtered for spam and then forwarded on to your real email address.

Individual users might want to use service to protect their personal information. But for a legitimate business like mine, I don’t see the point. I want people to know my contact information, and the domain record is just one of several ways that you can find me. If I check on a business and find their information is blocked, then I am suspicious. Of course, spam is a huge problem, but this is not a solution to it. The people that really benefit from these services are the bad guys who can add one more layer of disguise between them and us.

Internet Address Tools

Three tools play essential roles in helping us query the databases and names and numbers as well as explore the structure of the network around those machines. dig , whois, and traceroute are all included in standard Unix and Mac OS X distributions. Windows users will find variants of all of these, available for free or as shareware. Unfortunately there are so many of these that it is hard to make any specific recommendations. Look them up on your favorite search engine and try a few of them out. Web page interfaces to the tools can also be found on a number of sites.

dig

dig (domain information groper) is a DNS lookup utility that I will use extensively in the course of this book. dig can help you find the IP address for a given hostname and the hostname, if any, for a given IP address.

You may already be familiar with a similar tool called nslookup . A precursor of dig, its use is now discouraged, even though it is still included in most Unix distributions. The same applies to host, which is also widely available. You may find that you prefer the command syntax or output format of one tool over another. I am only going to describe dig in detail here.

Hostname lookups

In its simplest form, dig will get the IP address for the supplied hostname. Here is a typical example:

  1      % dig www.craic.com
  2      ; <<>> DiG 9.2.3 <<>> www.craic.com
  3      ;; global options:  printcmd
  4      ;; Got answer:
  5      ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 57325
  6      ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 3, ADDITIONAL: 1
  7
  8      ;; QUESTION SECTION:
  9      ;www.craic.com.                 IN      A
 10
 11      ;; ANSWER SECTION:
 12      www.craic.com.          600     IN      A       208.12.16.5
 13
 14      ;; AUTHORITY SECTION:
 15      craic.com.              600     IN      NS      dns3.seanet.com.
 16      craic.com.              600     IN      NS      dns1.seanet.com.
 17      craic.com.              600     IN      NS      dns2.seanet.com.
 18
 19      ;; ADDITIONAL SECTION:
 20      dns3.seanet.com.        82411   IN      A       199.181.164.3
 21
 22      ;; Query time: 98 msec
 23      ;; SERVER: 192.168.2.18#53(192.168.2.18)
 24      ;; WHEN: Fri Jan  7 14:16:07 2005
 25      ;; MSG SIZE  rcvd: 127

The format of the output is pretty cryptic, with lots of extraneous text that tends to bury the useful content.

The first five lines are status and version information. Lines 8 and 9 are the Question Section, which merely reiterate the query we gave on the command line. Lines 11 and 12 are what we care about. In this case, we see that the hostname http://www.craic.com maps to the IP address 208.12.16.5. Bear in mind that there may not be an Answer Section. That means that there is no host of that name in any public DNS server on the Internet. Unfortunately, rather than just telling us “host not found,” dig does so indirectly by not giving us an answer. This takes a bit of getting used to.

Lines 14 through 17 are the Authority Section. This tells us which DNS servers carry the Start of Authority (SOA) records for the target machine. In most cases, the authoritative server(s) will be based at the host’s ISP or the site at which that host’s domain was registered. Lines 19 through 25 are largely irrelevant for our purposes but can be valuable in debugging DNS problems.

If the default output is too verbose, you can use the +short option, thus:

                  % dig +short www.craic.com
    208.12.16.5

This form is almost too terse. In fact, if the hostname cannot be found, it returns with no output at all. This is useful if you want to embed the command in shell scripts.

Reverse lookups

Supplied with the -x option and an IP address, dig will find the corresponding hostname. This is called a reverse lookup. Here is an example:

                  % dig -x 208.12.16.5
    ; <<>> DiG 9.2.3 <<>> -x 208.12.16.5
    ;; global options:  printcmd
    ;; Got answer:
    ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 48532
    ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 3, ADDITIONAL: 1

    ;; QUESTION SECTION:
    ;5.16.12.208.in-addr.arpa.      IN      PTR

    ;; ANSWER SECTION:
    5.16.12.208.in-addr.arpa. 84600 IN      PTR     gateway.craic.com.

    ;; AUTHORITY SECTION:
    16.12.208.in-addr.arpa. 84600   IN      NS      dns2.seanet.com.
    16.12.208.in-addr.arpa. 84600   IN      NS      dns3.seanet.com.
    16.12.208.in-addr.arpa. 84600   IN      NS      dns1.seanet.com.

    ;; ADDITIONAL SECTION:
    dns3.seanet.com.        82813   IN      A       199.181.164.3

    ;; Query time: 358 msec
    ;; SERVER: 192.168.2.18#53(192.168.2.18)
    ;; WHEN: Fri Jan  7 14:09:25 2005
    ;; MSG SIZE  rcvd: 153

The line returned in the answer section tells us the hostname that we are seeking. Before we had the hostname on the left side and the IP address on the right. Here we have the IP address in reverse on the left and a hostname on the right.

Notice something interesting in the results that dig has returned? We first asked for the IP address corresponding to http://www.craic.com and got 208.12.16.5. Then we asked for the hostname corresponding to 208.12.16.5 and got http://gateway.craic.com instead of http://www.craic.com. This is because the name gateway is the canonical, or primary, name for this host and www is an alias that points to the same machine.

Within DNS you can map many names to a single IP address either directly, using what are called A records, or indirectly, using CNAME records that map one name to another, which in turn maps to a numeric address. The reverse mapping, however, should only contain a single record for each IP address, containing the canonical hostname.

In addition, a single hostname can map to multiple IP addresses. This is how large sites distribute their load across multiple servers.

Back and forth

Using dig in this forward and reverse manner can reveal interesting things about a site. Here is an example using one of the O’Reilly web sites, http://www.macdevcenter.com. I have edited the output heavily to save space. Going forward, dig tells us that http://www.macdevcenter.com is a CNAME alias of http://macdevcenter.com and that the hostname maps to two IP addresses.

                  % dig www.macdevcenter.com

    [...]
    ;; ANSWER SECTION:
    www.macdevcenter.com.   6426    IN      CNAME   macdevcenter.com.
    macdevcenter.com.       4812    IN      A       208.201.239.36
    macdevcenter.com.       4812    IN      A       208.201.239.37

Taking one of those addresses and running a reverse lookup returns this output:

                  % dig -x 208.201.239.36

    [...]
    ;; ANSWER SECTION:
    36.239.201.208.in-addr.arpa. 86371 IN   PTR     www.oreillynet.com.

This shows that the canonical name for this server is http://www.oreillynet.com. From this asymmetry, we could infer that either http://macdevcenter.com is a subdivision of http://oreillynet.com—which happens to be true—or that perhaps the latter is a web-hosting company that manages http://macdevcenter.com for a subscriber.

In many cases like this, in which you think the target site is up to no good, what you really want is the reverse lookup to list all the hostnames that map to a single address. Unfortunately DNS won’t give that to us. In principle it can, in response to a zone transfer request using the AXFR type, but most DNS servers have this feature disabled.

Warning

You should be aware that DNS lookups do not always work as advertised. In particular, DNS tables may not be properly configured for reverse lookups. Whether this is by accident or design is sometimes open to question.

whois

whois is the primary tool for querying the domain registration databases. It is available as a standard command on Unix and Mac OS X systems, and most domain registry web sites include a web interface to the command.

The basic way to use whois is to enter a domain name or an IP address after the command—for example, whois http://craic.com or whois 208.12.16.5. The command syntax can be a lot more involved than this, but we don’t need any fancy options here. The manpage for your implementation will tell you more.

Warning

An important point here is that, even though the basic syntax for whois is essentially the same as dig, whois tells us about domains and networks whereas dig tells us about individual hosts. Their roles are complementary.

Dissecting a whois report

Consider a basic listing in detail. The following is the output of a query on my domain name. The real thing contains a load of disclaimers and “terms of use” statements that have been replaced with [...] for readability. I’ve also added line numbers to help refer to specific items.

 1   % whois craic.com
 2   [whois.crsnic.net]
 3   Whois Server Version 1.3
 4   [...]
 5      Domain Name: CRAIC.COM
 6      Registrar: NETWORK SOLUTIONS, INC.
 7      Whois Server: whois.networksolutions.com
 8      Referral URL: http://www.networksolutions.com
 9      Name Server: DNS1.SEANET.COM
10       Name Server: DNS2.SEANET.COM
11       Status: ACTIVE
12       Updated Date: 05-nov-2001
13       Creation Date: 22-may-1997
14       Expiration Date: 23-may-2006
15
16    >>> Last update of whois database: Tue, 17 Feb 2004 06:50:46 EST <<<
17    [...]
18    [whois.networksolutions.com]
19    [...]
20    Registrant:
21    Jones, Robert (CRAIC-DOM)
22       Robert Jones
23       Craic Computing
24       911 East Pike Street #231
25       SEATTLE, WA 98122
26       US
27       Domain Name: CRAIC.COM
28
29       Administrative Contact, Technical Contact:
30          Jones, Robert  (RJ1571)
31          Robert Jones
32          Craic Computing
33          911 East Pike St #231
34          SEATTLE, WA 98122
35          US
36          <phone number>
37
38       Record expires on 23-May-2006.
39       Record created on 22-May-1997.
40       Database last updated on 17-Feb-2004 16:12:04 EST.
41
42       Domain servers in listed order:
43       DNS1.SEANET.COM              199.181.164.1
44       DNS2.SEANET.COM              199.181.164.2

When you submit a query like this, whois sends it out to the whois server that is the default for your specific implementation of the command. In this case, according to line 2, the server used was http://whois.crsnic.net. That server looks up the domain in its local database to see where it is registered, and then it queries that registrar for additional information. This two-tiered approach results in some duplication of information and usually major differences in the display format.

Line 6 tells us that the domain is registered with Network Solutions and line 18 shows that their database was queried for the second part of the response.

Lines 13 and 39 tell us the database record was created on May 22, 1997. Similarly, lines 14 and 38 tell us how long the domain has been registered for.

Sites of dubious intent will typically have been registered just a few days or weeks before you receive any email from them, and the length of the registration will invariably be the minimum term of one year. In the case of http://craic.com, you can see that the business has been around for several years and expects to continue for several more. These dates can serve as a useful background check on any company that you might want to do business with.

There is a discrepancy between the update dates given in lines 16 and 40, illustrating the fact that two databases have queried to produce this output.

The DNS servers listed in lines 9 and 10 and again in lines 43 and 44 show that a relationship exists between http://craic.com and http://seanet.com. In the majority of cases, the authoritative DNS servers for a domain will either be at the domain registry or at the ISP used by that domain. In this case, Seanet is the ISP that I use and they manage those DNS records on my behalf.

Lines 20 through 36 represent contact information for the person or persons responsible for this domain. In the case of my domain, you can see that I serve as both the registrant and the administrative and technical contacts. You can see my name, business address, email, and phone number. This information is supposed to be accurate and kept up to date so that anyone can contact the owner in case of problems accessing the site or in case the site is up to no good.

Privacy blocks on domain information

We mentioned the introduction of privacy proxies by the registrars a little earlier. Here is a section from a domain record that uses this service:

    Domain Name: GREENTREEPROMOS.COM

    Administrative Contact:
        Media, LLC, Revolution
        [email protected]
        ATTN: GREENTREEPROMOS.COM
        c/o Network Solutions
        P.O. Box 447
        Herndon, VA 20172-0447
        570-708-8780

The email address is a random string of characters that changes on a regular basis.

Diversity in whois output

As soon as you start to work with whois, you will become aware of the variation in the way the results are presented. In fact it’s a real mess. It seems like every domain registry has its own format, and the real information is buried in the middle of verbose legal disclaimers and warnings.

This can be a real nuisance for people like us who want to process these records. What we would prefer is a standard format, preferably in XML, that would make it easy for us to pipe the results into scripts that parse out the relevant data. The registrars have intentionally not provided us with this. The problem is that, in addition to people making legitimate requests, spammers have used whois to trawl registry databases in order to build up lists of email addresses. I get a huge amount of spam, which is undoubtedly due to my email address having been included in a domain registry since 1997. It can be really frustrating working with these records but, at least for now, there is not a lot we can do about it.

On top of this, you should be aware that not all Unix whois clients are the same. RedHat Fedora 2, for example, included jwhois v3.2.2, whereas Mac OS X has a version from BSD Unix with a different set of options. We don’t need to use any of those here but check the manpage for your version to learn more.

RedHat 7.3 included yet another variant with an interesting feature. That version would interpret a domain name not only in the literal way it was written but also as a prefix on other domains. In this form it would search and return all hostnames that matched the supplied string. This behavior led certain miscreants to create hostnames that are very rude about our friends at Microsoft and that are only revealed through whois.

If you have access to this particular version and are not easily offended by bad language, then try the following simple query. It returns a large number of matching hostnames, of which a few of the tamer ones are shown.

                  % whois microsoft.com
    [...]
    Microsoft.com.fills.me.with.belligerence.net
    Microsoft.com.zzz.is.owned.and.haxored.by.sub7.net
    Microsoft.com.should.give.up.because.linuxisgod.com

This and other, more useful, features have been disappearing from both domain and DNS lookup tools over the past few years. The main motivation has been security, as certain features were felt to reveal a bit too much about networks. In the past you could find out all the domains owned by an individual and all the DNS records for a given domain. Sadly those days are gone.

Bogus information from whois

Many of the domains that are associated with Internet fraud contain false contact information. ICANN and the registries make all the right noises about ensuring this information is correct, but they seem unable or unwilling to control the problem. So we just have to live with bad data—which is not to say that domain records are useless. Let’s look at an example of a bogus record and see what can be salvaged from it.

                  % whois mycitibank.org
    [whois.publicinterestregistry.net]
    [...]
    Domain ID:D104488069-LROR
    Domain Name:MYCITIBANK.ORG
    Created On:02-Jun-2004 18:53:15 UTC
    Expiration Date:02-Jun-2005 18:53:15 UTC
    Sponsoring Registrar:R51-LROR
    Status:TRANSFER PROHIBITED
    Registrant ID:P-BTP31-449435
    Registrant Name:Benjamin A Perowsky
    Registrant Organization:Benjamin A Perowsky
    Registrant Street1:173 Dean St.#3
    Registrant City:Brooklyn
    Registrant Postal Code:11217
    Registrant Country:US
    Registrant Phone: <phone number>
    Registrant Email:[email protected]
    [...]
    Tech ID:P-NCT21-63
    Tech Name:Hostmaster Hostmaster
    Tech Organization:united-domains AG
    Tech Street1:Gautinger Strasse 10
    Tech City:Starnberg
    Tech Postal Code:82319
    Tech Country:DE
    Tech Phone:<phone number>
    Tech Email:[email protected]
    Name Server:SERVER1-NS1.UDAGDNS.NET
    Name Server:SERVER1-NS2.UDAGDNS.NET
    Name Server:SERVER1-NS3.UDAGDNS.NET

This is the record for http://mycitibank.org, used at one time for a phishing site that pretended to be Citibank. It is safe to assume that Mr. Perowsky of Brooklyn, if he exists, did not register this domain. The fact that the email address is in Russia is a clue. That address may be correct. The registry needs a way to communicate with registrants in order to bill them, but this may not do us any good as we can’t tell who really receives the email. The information about the registry is going to be correct as they created this record. The same goes for the creation, expiration dates, and the authoritative DNS servers. These are all useful snippets of information.

Even if we know the contact information is bad, we can use it if we are looking at a number of domains we think might be related. That’s because people tend to be lazy. If you are registering several bogus domains, are you really going to think up different and convincing fake contact information for each of them? We can use similar or identical fake addresses to build links between apparently unconnected domains, as we do in the worked example at the end of this chapter. They serve as a type of fingerprint of the people involved.

Using whois to query IP address blocks

We can also use whois to look up an IP address. While this may look like the reverse DNS lookups we used earlier, it is a different function that will turn out to be very useful.

                  % whois 208.12.16.5
    Sprint SPRINTLINK-BLKS (NET-208-0-0-0-1)
                                      208.0.0.0 - 208.35.255.255
    Seanet Corporation FON-34904473604317 (NET-208-12-0-0-1)
                                      208.12.0.0 - 208.12.31.255

    # ARIN WHOIS database, last updated 2005-01-06 19:10
    # Enter ? for additional hints on searching ARIN's WHOIS database.

Nowhere in the output is there any mention of 208.12.16.5 or http://craic.com, so what’s going on here? These are the subnets of IP addresses that our address is part of. First off, our target address is located in the United States, so the database that answered the query is at ARIN. They are telling us that Seanet Corporation controls addresses 208.12.0.0 through 208.12.31.255 and that Sprint controls the even larger network, of which Seanet is a part.

We can reasonably infer that Seanet is my ISP or that my ISP has its addresses allocated to them by Seanet. That is important information. If we find the IP address of a site that is up to no good, we may want to ask their ISP to shut them down. This form of whois query can quickly help us find out who we need to talk to.

As I say, the form of report you get depends on the regional registry that manages that block of IP addresses. Here are examples of addresses in the other three regions. Unimportant text has been edited out for the sake of readability.

Here is what the output of APNIC looks like for an address in its region of control:

                  % whois 211.144.162.160

    [Querying whois.apnic.net]
    [...]
    inetnum:      211.144.160.0 - 211.144.175.255
    netname:      LIANFENGMAN
    country:      CN
    descr:        CHONGQING LIANFENG COMMUNICATION Co.,Ltd
    descr:        18F, BUIDING-A, CITY PLAZA, 39-WUSI ROAD,YUZHONG
                  DISTRICT, CHONG QING,PRC.
    admin-c:      DC278-AP
    tech-c:       ZL153-AP
    status:       ALLOCATED PORTABLE
    changed:      [email protected] 20041102
    mnt-by:       MAINT-CNNIC-AP
    source:       APNIC

    person:       DUAN CHUNYAN
    nic-hdl:      DC278-AP
    e-mail:       [email protected]
    address:      18F, BUIDING-A, CITY PLAZA, 39-WUSI ROAD,
                  YUZHONG DISTRICT, CHONG QING,PRC.
    phone:        <phone number>
    fax-no:       <phone number>
    country:      CN
    changed:      [email protected] 20041102
    mnt-by:       MAINT-CNNIC-AP
    source:       APNIC
    [...]

Here is a query for an address in the United Kingdom that gets handled by the RIPE NIC server, responsible for Europe and the Middle East:

                  % whois 212.20.227.174
    [...]
    [whois.ripe.net]
    [...]
    inetnum:      212.20.227.128 - 212.20.227.255
    netname:      EDNET-COLO-1
    descr:        edNET Internet Limited
    country:      GB
    admin-c:      NS1518-RIPE
    tech-c:       RM7978-RIPE
    status:       ASSIGNED PA
    mnt-by:       EDNET-RIPE-MNT
    changed:      [email protected] 20030716
    remarks:      INFRA-AW
    source:       RIPE

    route:        212.20.224.0/22
    descr:        edNET UK
    origin:       AS12703
    remarks:      removed cross-mnt:    EDNET-RIPE-MNT
    mnt-by:       EDNET-RIPE-MNT
    changed:      [email protected] 20031119
    source:       RIPE
    [...]

The output here tells of a block of 128 addresses (212.20.227.128-212.20.227.255) assigned to EDNET-COLO-1, which is probably a subnet of EDNET used for collocation of web servers. The line at the start of the second paragraph (route: 212.20.224.0/22) tells us this is itself part of a larger block, also assigned to EDNET with the range 212.20.224.0 - 212.20.255.255.

Finally, here is the format of report returned by LACNIC for an address in Chile:

                  % whois 146.83.12.32
    [whois.lacnic.net]
    [...]
    inetnum:     146.83/16
    status:      assigned
    owner:       Red Universitaria Nacional
    ownerid:     CL-RUNA1-LACNIC
    responsible: Claudia Inostroza
    address:     Canada, 239, Providencia
    address:     6640806 - Santiago -
    country:     CL
    phone:       <phone number>
    owner-c:     CIM2
    tech-c:      CIM2
    inetrev:     146.83/16
    nserver:     TERMINUS.REUNA.CL
    nsstat:      20050103 AA
    nslastaa:    20050103
    nserver:     NS.REUNA.CL
    nsstat:      20050103 AA
    nslastaa:    20050103
    created:     19910128
    changed:     20010222
    [...]

In this version, the IP address block is given in the alternate format we mentioned earlier. 146.83/16 means that the starting address is 146.83.0.0 with the highest 16 bits fixed and hence the remaining 16 bits being available for allocation. This translates into the address range of 146.83.0.0 through 146.83.255.255.

I need to stress, once again, that different versions of whois may behave differently. Mac OS X will query ARIN first regardless of the IP address. If ARIN says it is out of their range, it uses their referral to go to the correct registry. You end up with the correct information buried in reams of irrelevant verbiage. The version that ships with Linux (RedHat Fedora Core 2) figures out the correct registry without this intermediate step, probably through a simple lookup table, and returns its results quickly and cleanly. Bear this in mind if you want to write scripts that parse whois output.

whois on the Web

You can also access whois through a variety of web interfaces, in particular at domain registries. Here are several examples:

Spammers have used domain records as a source of email addresses for some time now. A standard tactic has been to use a script to make thousands of requests to web-based whois clients. These days most of the sites will either prevent you from making more than a certain number of requests in a period of time, or they will display an image of a number on the query form that you will need to type into the form along with the domain name. That can get tedious, but there are times when a web-based client comes in handy.

These may not provide the full functionality of the Unix clients. Some will only respond to domain name queries, whereas the clients at the four RIRs, shown in Table 2-1, seem to respond only to IP address queries.

Two web-based clients are worthy of special mention. Netcraft is a company in the U.K. that tracks various aspects of technology on the Internet. They have a large database of domain names, web sites, and ancillary data. Their whois-like client (http://searchdns.netcraft.com/?host) lets you search this resource and offers a number of features not available from standard whois. In particular you can search on domain names using substrings and wildcards. A simple query like craic will return all domains that contain that string. This can be very useful when you want to find sites that might be involved in phishing. Try searching on PayPal or eBay and see how many domains show up. http://sqlwhois.com provides a similar service with their client (http://www.sqlwhois.com/en/index.html). Here you have even more control over your query terms, but their database is limited to the .com and .net registries.

traceroute

dig and whois tell you about specific addresses on the Internet and who controls them. traceroute tells you about the path between two addresses—how to get there from here. Run on host A, with host B as its target, traceroute fires off packets that are passed through a series of intervening gateways or routers as determined by the Internet protocol and the topology of the Internet.

Normal network transactions, like a request for a web page, do not report the path they take from A to B. traceroute, on the other hand, triggers a response from every router along the way. It does this by utilizing the IP protocol time to live field and attempts to elicit an ICMP TIME_EXCEEDED response from each machine. If successful, it captures the IP address of the machine and the time at which the response was received. It performs a reverse lookup on the IP address in the hope of getting a hostname. It doesn’t always work as well as we’d like. Not all machines provide the ICMP TIME_EXCEEDED response, and many routers do not have corresponding hostnames, so its output can be very cryptic at times. But in many cases it provides a very useful perspective on the network connectivity of the target host and their ISP.

You can infer a lot from the output of traceroute on a particular address. It can provide clues about the type of network the target machine is part of, it can reveal their ISP, and it may be able to tell you something about how the ISP is connected to the rest of the Net.

Here is the output of the command run from a machine in Australia (http://looking-glass.uecomm.net.au/) pointed at one of my servers. I have deleted some timing information from each step to improve readability.

    traceroute to 208.12.16.5 from looking-glass.uecomm.net.au,
     30 hops max, 38 byte packets
     1  vl2021.agg1.cit190.uecomm.net.au (203.94.128.105)
     2  180.gi1.br1.que31.uecomm.net.au (218.185.31.122)
     3  sl-gw1-mel-6-0-0.sprintlink.net (203.222.35.229)
     4  sl-bb21-syd-1-0.sprintlink.net (203.222.33.18)
     5  sl-bb21-syd-14-1.sprintlink.net (203.222.32.49)
     6  sl-bb21-sj-3-2.sprintlink.net (144.232.8.130)
     7  sl-bb23-tac-14-0.sprintlink.net (144.232.20.9)
     8  sl-bb20-tac-5-0.sprintlink.net (144.232.17.173)
     9  144.232.17.54 (144.232.17.54)
    10  sl-seane-2-0-0.sprintlink.net (160.81.116.34)
    11  fermat.seanet.com (199.181.164.164)
    12  208.12.16.1 (208.12.16.1)
    13  gateway.craic.com (208.12.16.5)

The first two lines show how the source machine connects to the Internet backbone. The next eight lines show the path taken through the SprintLink backbone to Seattle. The last four tell about the network near to my server. Let’s work back from the last line. The next-to-the-last step (line 12) has only an IP address that is very similar to the target machine. The difference in numbers is so small that it is reasonable to assume they are both on the same network.

Businesses often have a range of IP addresses for their various publicly accessible server. These typically form a subnet that is connected to the ISP by way of a router. Simple routers are not usually given hostnames and are also usually given the first usable IP address in a subnet. So we can make an educated guess that 208.12.16.1 is a router that controls access to a small subnet on which the target is located.

Line 11 shows a machine at http://seanet.com. This might well be the ISP that the target connects to. Looking up Seanet on the Web shows it to be based in Seattle. It appears to serve a regional market so it may locate the target machine in the Seattle area.

Line 10 tells us that Seanet connects to the rest of the world via http://sprintlink.net.

By looking at some of the http://sprintlink.net lines and using some creative reasoning, we can even figure out the path taken between Australia and Seattle. Those SprintLink routers have hostnames and it looks like the location of each is embedded in the name. So my guess is that the path taken was from Melbourne to Sydney (syd), over to the United States to San Jose (sj), up the West Coast to Tacoma (tac) and finally to Seattle. Okay, so maybe the San Jose step is a bit of a stretch, but you get the idea.

If you are interested in the topology of the network and the connectivity of an ISP then you can repeat the same analysis using traceroute from other locations. Here is the output from the command run on a server in Vienna, Austria (http://www.vix.at/cgi-bin/lg.cgi).

    Tracing the route to gateway.craic.com (208.12.16.5)
     1  vix2.core01.vie01.atlas.cogentco.com (193.203.0.113)
     2  p6-0.muc01.atlas.cogentco.com (130.117.1.150)
     3  p14-0.core01.fra03.atlas.cogentco.com (130.117.1.198)
     4  p12-0.core01.dca01.atlas.cogentco.com (154.54.1.17)
     5  p6-0.core01.jfk02.atlas.cogentco.com (66.28.4.82)
     6  p15-0.core02.jfk02.atlas.cogentco.com (66.28.4.14)
     7  p14-0.core02.ord01.atlas.cogentco.com (66.28.4.86)
     8  p12-0.core01.mci01.atlas.cogentco.com (66.28.4.33)
     9  p5-0.core01.den01.atlas.cogentco.com (66.28.4.29)
    10  p5-0.core01.sea01.atlas.cogentco.com (66.28.4.101)
    11  g49.ba01.b001696-0.sea01.atlas.cogentco.com (66.250.9.98)
    12  Seanet.demarc.cogentco.com (66.28.31.98)
    13  fermat.seanet.com (199.181.164.164)
    14  208.12.16.1
    15  gateway.craic.com (208.12.16.5)

Here we see that a different backbone has been used to connect from Europe. The router locations are more cryptic, but I would guess that jfk (lines 5 and 6) refers to New York and den refers to Denver (line 9). Line 12 shows the end of the path via http://cogentco.com in Seattle followed by the same server as before at Seanet. This implies that Seanet has direct connections to both SprintLink and Cogent. Experimentation with traceroute from a number of other sites may turn up the same or additional connections and can suggest how large that ISP is.

There are a number of sites out there that are kind enough to provide web interfaces to traceroute and several other tools related to routing and connectivity. These are referred to as “Looking Glass” servers, since they are typically used to probe your own site. http://geektools.com provides a list of these at http://www.geektools.com/traceroute.php—but not all those listed are operational. Table 2-2 lists a few around the world that work at the time of this writing.

DNS Record Manipulation

The DNS infrastructure of the Internet plays a critical role in resolving host and domain names into IP addresses. A great deal of effort has gone into ensuring that DNS works efficiently and is resilient in the face of server failures, incorrect data, or malicious attempts to disrupt the system. But even with these safeguards in place, the system is still subject to attack.

The potential benefit for someone involved in Internet fraud is huge. If you can change the DNS records for a major bank so that they point to your fake site, then you can potentially capture the account numbers and passwords of anyone who logs into the system. This approach sidesteps the need to send out email messages that try to get users to log in, but it does require a high level of technical sophistication. Two approaches have been used: DNS Poisoning and Pharming .

DNS servers around the Internet keep their tables updated by querying other more authoritative servers. The structure is a hierarchy with the network root servers at its origin. In a DNS poisoning attack, DNS servers are manipulated to fetch updated, incorrect DNS records from a server that has been set up by the attacker. This is a sophisticated type of attack to which modern DNS servers are largely immune. But successful attacks do still take place, usually by exploiting bugs in the server software. In March 2005, the SANS Internet Storm Center reported one such attack in which users were redirected to sites that contained spyware, which was then downloaded to users’ computers. A detailed report on this attack can be found at http://isc.sans.org/presentations/dnspoisoning.php.

Pharming is somewhat of an umbrella term for several different approaches to manipulating DNS records. Rather than going after DNS servers directly, an attacker may try to con a domain registrar into changing the authoritative DNS record for a domain to point to their fake site. Examples of this form of social engineering have included someone simply calling a registrar on the phone and persuading them that they represent the owner of the target domain.

One example of this involved the New York-based Internet service provider Panix. In January 2005, an attacker was able to transfer control of its DNS records to a server in the United Kingdom, with all company email being redirected to a server in Canada. Even though the problem was spotted quickly, the impact on the company and its customers was substantial.

Another form of attack takes advantage of the fact that most operating systems have a local file of hostname-to-IP-address mappings that will be queried before making a remote DNS query. If such a file contains a match, then that address will be used without any further lookups. This has been exploited by a computer virus called the Banker Trojan. In addition to logging user keystrokes, it adds lines to the end of a host file on a Windows system that will redirect users to fake bank sites. Many variants of this trojan have been found.

DNS is fundamental to the operation of the Internet and usually works so well that people take it for granted. Attacks like these are a reminder that all components of the Internet are vulnerable.

An Example—Dissecting a Spam Network

Now let’s see how these tools can be used in the real world. This section shows how you can figure out the structure of a sophisticated spam operation. A point that I will stress here and throughout the book is how valuable it can be to have multiple examples of an email or a web site. Even though the details may differ, the similarities between them can be very revealing.

For a while last year I was getting a lot of spam emails that all had a similar underlying appearance. The products being offered varied, as did the name of the Sender, but they clearly had a common origin. The From addresses all had the form and they all had the same mechanism for unsubscribing from their mailing list. So I collected a bunch of messages that fit this pattern and made a list of the web sites they were directing me to. At first glance these seemed to be a diverse group but as I added more examples the domain names started to take on a similar form. That was my motivation to investigate further and start to run dig on the hostnames. Table 2-3 shows a small sample of the results from that survey, sorted by IP address.

Table 2-3. Hostnames with similar IP addresses

Warning

Web sites come and go. The dodgy ones, in particular, often have a very short life. So don’t be surprised if the specific IP addresses and hostnames given here no longer give the same results. Instead, let the examples illustrate the underlying techniques and use them to explore sites that you come across in your own email.

First, look at the hostnames. You can see a common pattern in the domain names with two or three words joined together that almost make sense. Likewise, the first part of each hostname has the form of a name and a number, and there are two groups that are arranged sequentially. Now look at the IP addresses—the pattern is glaringly obvious. The people behind this operation would appear to have a bank of servers covering a significant block of IP addresses. These are organized very logically such that, for example, servers in the http://dynamicrhythms.com block have consecutive IP addresses.

It’s a safe bet that other servers occupy the gaps in the IP address range. We can even predict some of the hostnames. The next step was to figure out just how large this network was. I couldn’t get that information directly, but by calling dig systematically across a range of addresses, I thought I might be able to define its limits. Doing this one address at a time became tedious, so I wrote a small Perl script that takes a range of numeric addresses and performs a reverse lookup on each of them. This can be useful in other scenarios, so I’ve included it here as Example 2-1. Note that you need to switch between the dotted-quad notation that dig expects and the decimal form you need to step through sequentially.

Example 2-1. scan_ip_range.pl
#!/usr/bin/perl -w
# Runs dig on all IP addresses in the specified range

die "Usage: $0 <start IP addr> <end IP addr>
" unless @ARGV == 2;
my $start_dec = dotted_quad_to_decimal($ARGV[0]);
my $end_dec   = dotted_quad_to_decimal($ARGV[1]);

for(my $i=$start_dec; $i<=$end_dec; $i++) {
    my $i_ip = decimal_to_dotted_quad($i);
    my $hostname = `dig +short -x $i_ip`;
    printf "%-15s %s", $i_ip, $hostname;
}

sub dotted_quad_to_decimal {
   my @fields = split /./, shift;
   (fields[0] * 16777216) + ($fields[1] * 65536) +
   ($fields[2] * 256)     +  $fields[3];
}

sub decimal_to_dotted_quad {
    my $decimal = shift;
    my $factor = 16777216;
    my @quad = ();
    for(my $i=0; $i<4; $i++) {
       $quad[$i] = int($decimal / $factor);
       $decimal -= $quad[$i] * $factor;
       $factor /= 256;
    }
    join ".", @quad;
}

Running this over the 66.111.233.x and 66.111.234.x blocks (of 256 addresses each) uncovered 211 hostnames similar to those above, which fell into 60 groups of related names. I didn’t bother to scan adjacent blocks, but I know from other sources on the Web that the network extends even further than this. Here is a sample of the scan output:

    66.111.233.168  233-111-66.ftl-nj.webhostplus.com.
    66.111.233.169  233-111-66.ftl-nj.webhostplus.com.
    66.111.233.170  dyna1.dynamicrhythms.com.
    66.111.233.171  dyna2.dynamicrhythms.com.
    66.111.233.172  dyna3.dynamicrhythms.com.
    66.111.233.173  dyna4.dynamicrhythms.com.
    66.111.233.174  dyna5.dynamicrhythms.com.
    66.111.233.175  spec1.greenplanetspecials.com.
    66.111.233.176  spec2.greenplanetspecials.com.

One other thing to note from these scans was the mapping of a significant number of the IP addresses in the 66.111.233.x block to a single host called http://233-111-66.ftl-nj.webhostplus.com and to http://234-11-66.ftl-nj.webhostplus.com in the other block. We’ll return to this shortly.

So far we’ve used dig for reverse lookups. Using it with the reported hostnames would not be expected to add much information in this case. In fact, a sampling of such queries as I write this, some months after that period of spam, shows that many do not return IP addresses. That tells me that not only have these sites been taken down but also that the DNS entries have been removed. Fortunately for us, someone slipped up and left the reverse entries in the tables. The management of DNS records can be surprisingly sloppy and still work just fine. Sometime that works to your advantage.

Now let’s see what whois can contribute to this story. Running it on a sample of the domain names turns up a mixed bag of names and addresses in the contact information. Most of the domains appear linked to three addresses in the towns of Sunny Isles Beach, Aventura, and Hollywood, which are all in Florida. I don’t know if these are real addresses or not, but they serve as a type of signature or fingerprint for the people behind these sites. We’ll talk more about making these kinds of connections later in the book.

Warning

Note that you should NOT write scripts that attempt to step through whois records the way I did with the DNS lookups. This is exactly how spammers have built up their mailing lists in the past, and the domain registries will likely detect your script and block any further whois queries coming from your computer. Modest numbers of queries submitted manually should not get you into trouble.

Using whois with any of the IP addresses revealed something about the network these servers reside in:

    [whois.arin.net]
    OrgName:    WebHostPlus Inc
    OrgID:      WEBHO-3
    Address:    100 Plaza drive
    City:       Secaucus
    StateProv:  NJ
    PostalCode: 07094
    Country:    US

    NetRange:   66.111.192.0 - 66.111.255.255
    CIDR:       66.111.192.0/18
    NetName:    WEBHOSTPLUS-INC
    NetHandle:  NET-66-111-192-0-1
    Parent:     NET-66-0-0-0-0
    NetType:    Direct Allocation
    NameServer: NS.WHP-SERVER.COM

WebHost Plus is a well-established company in New Jersey that provides web hosting and other services to a large number of clients. Our friends sending out the emails are simply using them to host their web sites. But with over 200 web sites, each with a unique IP address, this looks like a big operation. Are they really running that many different web servers and physical computers?

No, what they are doing is configuring their servers with multiple IP addresses. Even with a single Ethernet card, you can configure Linux, for example, to act as though it has 256 IP addresses. Then you configure the Apache web server to respond to each address with a different web site. That’s what was going on with the 66.111.233.x addresses handled by one machine (http://233-111-66.ftl-nj.webhostplus.com) and the 66.11.234.x block handled by another. In their DNS tables, all the addresses were mapped to the canonical names of those machines until they were allocated to a client’s site. This is how companies such as WebHost Plus can afford to offer web sites for just a few dollars a month. You are sharing the server with other people and, as long as no one site hogs all the CPU cycles, it will appear as though you have your own dedicated server.

It seems like our friends are giving themselves a lot of extra work creating and managing all these distinct web sites. Why go to all that trouble? It’s all an attempt to evade the spam filters that are becoming ever more sophisticated. By generating emails with continually evolving content and including links to web sites with different hostnames they can avoid—or at least delay—being detected by the spam filters and being blacklisted by mail relays. They can run one web site for a week or two, shut it down, and then reappear under a totally different name.

This example has shown how much can be learned about an operation simply using dig and whois. By looking at similar emails, I found a set of hostnames that resembled each other. dig revealed that these all had similar IP addresses. Reverse lookups across a wider range of addresses turned up a lot more domains and hostnames, and whois showed that the same company hosted all of these. Unallocated addresses from the reverse lookup scan suggested that two physical servers were being used to host all these web sites. Running whois on the domain names turned up a confused mass of contact information that, in isolation, was not that useful. But even untrustworthy contact information can be useful as a signature or fingerprint for this operation.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.218.184