Hostnames, and the numeric addresses they correspond to, are the way to identify computers on the Internet. Understanding how these names and numbers are managed is therefore a fundamental aspect of Internet forensics. This chapter describes the types of information you can obtain from public databases of Internet addresses and discusses three essential tools that can help you identify machines and the people behind them. I’ll start with a short review of how computers are identified on the Internet.
Each computer on the Internet has a unique identifier in the form
of its Internet Protocol (IP) address. This is a
32-bit integer, which we normally represent as four 8-bit integers
separated by periods, such as 208.12.16.5
.
Numeric addresses are fine for systems administrators who need to set up networks and who like that sort of thing. But for most people, they are impossible to remember and so we have real names for computers, the hostnames that we are all familiar with, such as http://www.oreilly.com.
The translation between hostnames and IP addresses is handled by the Domain Name System (DNS). For example, when you type a hostname into a browser as part of a URL, the browser converts the name into the corresponding IP address and then uses that to communicate with the web server. The browser queries a DNS server on the network, which looks up the name in its database and returns the numeric address to the browser.
In its simplest form, a DNS server consists of two tables of data and the software necessary to interrogate them. The first table is a list of hostnames and the IP addresses to which they correspond. The second is a list of IP addresses and the hostnames to which they map. Storing the addresses of all computers on the Internet on every server is not practical, so DNS distributes the data across many thousands of servers around the world. If a DNS server receives a query for a hostname that it does not carry data for, it forwards the query to other servers until it finds one that can answer the request. Certain servers are authoritative for particular domains, meaning that they are the ultimate reference for mappings between certain sets of names and numbers. What goes on behind the scenes of DNS can become very complex, especially where the networks of large companies are involved.
I can only scratch the surface of the topic here, but for more information you might consider the books DNS and BIND by Paul Albitz and Cricket Liu and DNS and Bind Cookbook by Cricket Liu, both published by O’Reilly.
To ensure that computers are uniquely identified, the IP addresses need to be carefully assigned to groups and individuals. This is done in a hierarchical manner across the entire Internet. At the highest level, the Internet Assigned Numbers Authority (IANA) assigns large blocks of addresses to Regional Internet Registries (RIRs). There are four RIRs at present that together cover the entire world. Each of these assigns sub-blocks of addresses to national registries, large network operators, and Internet Service Providers (ISPs). They assign yet smaller address blocks to smaller ISPs, and ultimately your ISP assigns a small address block for your business or a single address for your personal computer.
You can think of these assignments as starting with the high
order bits of the 32-bit address and working down. For example, IANA
assigned the block 208.0.0.0
through 208.255.255.255
, among
others, to the RIR responsible for North America. They in turn
allocated 208.0.0.0
through
208.35.255.255
to Sprint, one of
the large network operators. Sprint assigned 208.12.0.0
through 208.12.31.255
to Seanet, a regional ISP in
Seattle, and they in turn assigned 208.12.16.0
through 208.12.16.7
to me.
The usual representation of an IP address—for example, 208.12.16.5
—is called dotted-quad , dotted-octet
, or dotted-decimal , depending on where you look. I’ll use the first of
these throughout the book. Sometimes it is useful to think of them as
32-bit binary words and occasionally as single integers. We’ll also
encounter a related notation for blocks of IP addresses. 208.12.16.x
, for example, is shorthand for
the block of 256 addresses that start with 208.12.16.0
. A more flexible notation looks
like this: 208.12.16.0/29
. This has
an IP address that marks the start of the block followed by a slash
and a number called the prefix-length. This is the number of bits,
starting at the high end, that have are predefined in this block. The
number of low order bits that are available for allocation is 32 minus
this number. So in this example there are 3 bits available, which
means this subnet has 8 addresses.
One of the fundamental tasks you will face is figuring out where in the world a particular server is located. An easy way to do this is to look at the IP address. As I have described, large blocks of addresses are assigned to the four RIRs around the world. Their areas of responsibility are as follows:
ARIN (http://www.arin.net) is responsible for North America, part of the Caribbean, and Sub-Equatorial Africa.
APNIC (http://www.apnic.net) is responsible for countries in Asia and the Pacific Rim, including China, Korea, India, Japan, and Australia.
RIPE NCC (http://www.ripe.net) covers Europe, the Middle East, Northern Africa, and parts of Asia. RIPE stands for Réseaux IP Européens, which translates into European IP Resources.
LACNIC (http://www.lacnic.net) has responsibility for Latin America and the Caribbean.
The list of top-level assignments of IP addresses can be found here:
http://www.iana.org/assignments/ipv4-address-space |
By top-level, I mean the address blocks defined by the
leftmost integer in a dotted quad IP address, each of which contains
16,777,216 (256 × 256 × 256) addresses. The list makes interesting
reading. Starting in September 1981, many of the initial assignments
were to large U.S. corporations such as Ford Motor Company (019.x.x.x
) and IBM (009.x.x.x
). The RIRs were a later
development in the history of the Internet, but once established,
they were assigned discrete address blocks. The entire list is too
large to include, but here are the main blocks that are directly
assigned to each RIR:
063.x.x.x–072.x.x.x 199.x.x.x 204.x.x.x–209.x.x.x 216.x.x.x
058.x.x.x–061.x.x.x 202.x.x.x–203.x.x.x 210.x.x.x–211.x.x.x 218.x.x.x–222.x.x.x
062.x.x.x 081.x.x.x–088.x.x.x 193.x.x.x–195.x.x.x 212.x.x.x–213.x.x.x 217.x.x.x
200.x.x.x–201.x.x.x
You can use this as a quick reference to see that, for
example, 208.12.16.5
falls under
the control of ARIN and so is likely to be in North America or
Southern Africa. Not very specific, I’ll admit, but it can come in
quite handy.
The IP address system is clean, elegant, and works very well. But things are less tidy when we look at hostnames and domains. Nobody assigns me the domain http://craic.com or tells me what hostnames to give my servers. Instead I get to think up a clever domain name, register it so that no one else can use it, and then pick arbitrary names for the computers that reside under that domain name. There is, however, some control over domains.
The Internet Corporation on Assigned Names and Numbers (ICANN ) is the body responsible for assigning the top-level domains, such as .com, .org, and .biz, and for controlling the domain name registries. They are also responsible for the IANA, which I discussed in the previous section. Importantly, ICANN is the arbiter of disputes concerning domain names , usually involving trademark infringement.
ICANN gives its blessing to a large number of domain name registrars around the world, allowing them to accept requests from you and me to register our domain names. Those registrars maintain databases of contact information for domain owners. Many of the smaller registrars use the services of the larger companies to manage their records, effectively acting as retailers in a relationship with a wholesaler. These are the records that you will query when you want to learn who is responsible for a particular web site.
The specific information these registrars make available to the public includes the domain name itself, contact information, the date the domain was created, when it will expire, and when it was last updated. They also include the names of the DNS servers that are authoritative for each domain. But registrars do not tell us anything about the actual hostnames that exist within each domain. That is handled by DNS and, although many registrars also provide that service, it is a completely separate system. It is usually most efficient if your ISP manages your DNS records, as they are responsible for actually assigning the IP addresses.
The contact information for the owners of each domain is potentially the most useful piece of information. Unfortunately, when it comes to those that are involved in Internet scams, we can be pretty confident that their information is bogus. Some domain registrars make an attempt to verify the data, but with most, the effort is half-hearted at best. This lack of verification is a major reason why seemingly blatant fraud can flourish on the Net.
Identifying domain owners has become even more difficult of late due to new privacy services that registrars will provide for an additional fee. These services are intended to protect your privacy and prevent your information from being harvested by spammers. Your postal address, for example, will be replaced by a post office box that is managed by the registrar. They know your real address and will forward only certain types of documents, discarding any junk mail. Similarly, your contact email is replaced with an address at the registrar, which changes periodically. Any mail to that address is filtered for spam and then forwarded on to your real email address.
Individual users might want to use service to protect their personal information. But for a legitimate business like mine, I don’t see the point. I want people to know my contact information, and the domain record is just one of several ways that you can find me. If I check on a business and find their information is blocked, then I am suspicious. Of course, spam is a huge problem, but this is not a solution to it. The people that really benefit from these services are the bad guys who can add one more layer of disguise between them and us.
Three tools play essential roles in helping us query the databases
and names and numbers as well as explore the structure of the network
around those machines. dig
, whois
, and traceroute
are all included in standard Unix
and Mac OS X distributions. Windows users will find variants of all of
these, available for free or as shareware. Unfortunately there are so
many of these that it is hard to make any specific recommendations. Look
them up on your favorite search engine and try a few of them out. Web
page interfaces to the tools can also be found on a number of
sites.
dig
(domain information
groper) is a DNS lookup utility that I will use extensively in the course of this book.
dig
can help you find the IP
address for a given hostname and the hostname, if any, for a given IP
address.
You may already be familiar with a similar tool called nslookup
. A precursor of dig
, its use is now discouraged, even though
it is still included in most Unix distributions. The same applies to
host
, which is also widely
available. You may find that you prefer the command syntax or output
format of one tool over another. I am only going to describe dig
in detail here.
In its simplest form, dig
will get the IP address for the supplied hostname. Here is a typical
example:
1 % dig www.craic.com
2 ; <<>> DiG 9.2.3 <<>> www.craic.com
3 ;; global options: printcmd
4 ;; Got answer:
5 ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 57325
6 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 3, ADDITIONAL: 1
7
8 ;; QUESTION SECTION:
9 ;www.craic.com. IN A
10
11 ;; ANSWER SECTION:
12 www.craic.com. 600 IN A 208.12.16.5
13
14 ;; AUTHORITY SECTION:
15 craic.com. 600 IN NS dns3.seanet.com.
16 craic.com. 600 IN NS dns1.seanet.com.
17 craic.com. 600 IN NS dns2.seanet.com.
18
19 ;; ADDITIONAL SECTION:
20 dns3.seanet.com. 82411 IN A 199.181.164.3
21
22 ;; Query time: 98 msec
23 ;; SERVER: 192.168.2.18#53(192.168.2.18)
24 ;; WHEN: Fri Jan 7 14:16:07 2005
25 ;; MSG SIZE rcvd: 127
The format of the output is pretty cryptic, with lots of extraneous text that tends to bury the useful content.
The first five lines are status and version information. Lines
8 and 9 are the Question Section, which merely reiterate the query
we gave on the command line. Lines 11 and 12 are what we care about.
In this case, we see that the hostname http://www.craic.com maps to the IP address 208.12.16.5
. Bear in mind that there may
not be an Answer Section. That means that there is no host of that
name in any public DNS server on the Internet. Unfortunately, rather
than just telling us “host not found,” dig
does so indirectly by not giving us an
answer. This takes a bit of getting used to.
Lines 14 through 17 are the Authority Section. This tells us which DNS servers carry the Start of Authority (SOA) records for the target machine. In most cases, the authoritative server(s) will be based at the host’s ISP or the site at which that host’s domain was registered. Lines 19 through 25 are largely irrelevant for our purposes but can be valuable in debugging DNS problems.
If the default output is too verbose, you can use the +short
option, thus:
% dig +short www.craic.com
208.12.16.5
This form is almost too terse. In fact, if the hostname cannot be found, it returns with no output at all. This is useful if you want to embed the command in shell scripts.
Supplied with the -x
option
and an IP address, dig
will find
the corresponding hostname. This is called a reverse lookup. Here is an
example:
% dig -x 208.12.16.5
; <<>> DiG 9.2.3 <<>> -x 208.12.16.5
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 48532
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 3, ADDITIONAL: 1
;; QUESTION SECTION:
;5.16.12.208.in-addr.arpa. IN PTR
;; ANSWER SECTION:
5.16.12.208.in-addr.arpa. 84600 IN PTR gateway.craic.com.
;; AUTHORITY SECTION:
16.12.208.in-addr.arpa. 84600 IN NS dns2.seanet.com.
16.12.208.in-addr.arpa. 84600 IN NS dns3.seanet.com.
16.12.208.in-addr.arpa. 84600 IN NS dns1.seanet.com.
;; ADDITIONAL SECTION:
dns3.seanet.com. 82813 IN A 199.181.164.3
;; Query time: 358 msec
;; SERVER: 192.168.2.18#53(192.168.2.18)
;; WHEN: Fri Jan 7 14:09:25 2005
;; MSG SIZE rcvd: 153
The line returned in the answer section tells us the hostname that we are seeking. Before we had the hostname on the left side and the IP address on the right. Here we have the IP address in reverse on the left and a hostname on the right.
Notice something interesting in the results that dig
has returned? We first asked for the
IP address corresponding to http://www.craic.com and got 208.12.16.5
. Then we asked for the
hostname corresponding to 208.12.16.5
and got http://gateway.craic.com instead of http://www.craic.com. This is because the name
gateway
is the canonical, or primary, name for this host
and www
is an alias that points
to the same machine.
Within DNS you can map many names to a single IP address
either directly, using what are called A
records, or indirectly, using CNAME
records that map one name to
another, which in turn maps to a numeric address. The reverse
mapping, however, should only contain a single record for each IP
address, containing the canonical hostname.
In addition, a single hostname can map to multiple IP addresses. This is how large sites distribute their load across multiple servers.
Using dig
in this forward
and reverse manner can reveal interesting things about a site. Here
is an example using one of the O’Reilly web sites, http://www.macdevcenter.com. I have edited the output
heavily to save space. Going forward, dig
tells us that http://www.macdevcenter.com is a CNAME
alias of http://macdevcenter.com and that the hostname maps to
two IP addresses.
% dig www.macdevcenter.com
[...]
;; ANSWER SECTION:
www.macdevcenter.com. 6426 IN CNAME macdevcenter.com.
macdevcenter.com. 4812 IN A 208.201.239.36
macdevcenter.com. 4812 IN A 208.201.239.37
Taking one of those addresses and running a reverse lookup returns this output:
% dig -x 208.201.239.36
[...]
;; ANSWER SECTION:
36.239.201.208.in-addr.arpa. 86371 IN PTR www.oreillynet.com.
This shows that the canonical name for this server is http://www.oreillynet.com. From this asymmetry, we could infer that either http://macdevcenter.com is a subdivision of http://oreillynet.com—which happens to be true—or that perhaps the latter is a web-hosting company that manages http://macdevcenter.com for a subscriber.
In many cases like this, in which you think the target site is
up to no good, what you really want is the reverse lookup to list
all the hostnames that map to a single address. Unfortunately DNS
won’t give that to us. In principle it can, in response to a
zone transfer request using the
AXFR
type, but most DNS servers
have this feature disabled.
whois
is the primary tool for
querying the domain registration databases. It is available as a
standard command on Unix and Mac OS X systems, and most domain
registry web sites include a web interface to the command.
The basic way to use whois
is
to enter a domain name or an IP address after the command—for example,
whois http://craic.com or whois 208.12.16.5
. The command syntax can be
a lot more involved than this, but we don’t need any fancy options
here. The manpage for your implementation will tell you more.
An important point here is that, even though the basic syntax
for whois
is essentially the same
as dig
, whois
tells us about
domains and networks
whereas dig
tells us about
individual hosts. Their roles are
complementary.
Consider a basic listing in detail. The following is the
output of a query on my domain name. The real thing contains a load
of disclaimers and “terms of use” statements that have been replaced
with [...]
for readability. I’ve
also added line numbers to help refer to specific items.
1 % whois craic.com
2 [whois.crsnic.net]
3 Whois Server Version 1.3
4 [...]
5 Domain Name: CRAIC.COM
6 Registrar: NETWORK SOLUTIONS, INC.
7 Whois Server: whois.networksolutions.com
8 Referral URL: http://www.networksolutions.com
9 Name Server: DNS1.SEANET.COM
10 Name Server: DNS2.SEANET.COM
11 Status: ACTIVE
12 Updated Date: 05-nov-2001
13 Creation Date: 22-may-1997
14 Expiration Date: 23-may-2006
15
16 >>> Last update of whois database: Tue, 17 Feb 2004 06:50:46 EST <<<
17 [...]
18 [whois.networksolutions.com]
19 [...]
20 Registrant:
21 Jones, Robert (CRAIC-DOM)
22 Robert Jones
23 Craic Computing
24 911 East Pike Street #231
25 SEATTLE, WA 98122
26 US
27 Domain Name: CRAIC.COM
28
29 Administrative Contact, Technical Contact:
30 Jones, Robert (RJ1571)
31 Robert Jones
32 Craic Computing
33 911 East Pike St #231
34 SEATTLE, WA 98122
35 US
36 <phone number>
37
38 Record expires on 23-May-2006.
39 Record created on 22-May-1997.
40 Database last updated on 17-Feb-2004 16:12:04 EST.
41
42 Domain servers in listed order:
43 DNS1.SEANET.COM 199.181.164.1
44 DNS2.SEANET.COM 199.181.164.2
When you submit a query like this, whois
sends it out to the whois
server that is the default for your
specific implementation of the command. In this case, according to
line 2, the server used was http://whois.crsnic.net. That server looks up the
domain in its local database to see where it is registered, and then
it queries that registrar for additional information. This
two-tiered approach results in some duplication of information and
usually major differences in the display format.
Line 6 tells us that the domain is registered with Network Solutions and line 18 shows that their database was queried for the second part of the response.
Lines 13 and 39 tell us the database record was created on May 22, 1997. Similarly, lines 14 and 38 tell us how long the domain has been registered for.
Sites of dubious intent will typically have been registered just a few days or weeks before you receive any email from them, and the length of the registration will invariably be the minimum term of one year. In the case of http://craic.com, you can see that the business has been around for several years and expects to continue for several more. These dates can serve as a useful background check on any company that you might want to do business with.
There is a discrepancy between the update dates given in lines 16 and 40, illustrating the fact that two databases have queried to produce this output.
The DNS servers listed in lines 9 and 10 and again in lines 43 and 44 show that a relationship exists between http://craic.com and http://seanet.com. In the majority of cases, the authoritative DNS servers for a domain will either be at the domain registry or at the ISP used by that domain. In this case, Seanet is the ISP that I use and they manage those DNS records on my behalf.
Lines 20 through 36 represent contact information for the person or persons responsible for this domain. In the case of my domain, you can see that I serve as both the registrant and the administrative and technical contacts. You can see my name, business address, email, and phone number. This information is supposed to be accurate and kept up to date so that anyone can contact the owner in case of problems accessing the site or in case the site is up to no good.
We mentioned the introduction of privacy proxies by the registrars a little earlier. Here is a section from a domain record that uses this service:
Domain Name: GREENTREEPROMOS.COM Administrative Contact: Media, LLC, Revolution [email protected] ATTN: GREENTREEPROMOS.COM c/o Network Solutions P.O. Box 447 Herndon, VA 20172-0447 570-708-8780
The email address is a random string of characters that changes on a regular basis.
As soon as you start to work with whois
, you will become aware of the
variation in the way the results are presented. In fact it’s a real
mess. It seems like every domain registry has its own format, and
the real information is buried in the middle of verbose legal
disclaimers and warnings.
This can be a real nuisance for people like us who want to
process these records. What we would prefer is a standard format,
preferably in XML, that would make it easy for us to pipe the
results into scripts that parse out the relevant data. The
registrars have intentionally not provided us with this. The problem
is that, in addition to people making legitimate requests, spammers
have used whois
to trawl registry
databases in order to build up lists of email addresses. I get a
huge amount of spam, which is undoubtedly due to my email address
having been included in a domain registry since 1997. It can be
really frustrating working with these records but, at least for now,
there is not a lot we can do about it.
On top of this, you should be aware that not all Unix whois
clients are the same. RedHat Fedora
2, for example, included jwhois
v3.2.2, whereas Mac OS X has a version from BSD Unix with a
different set of options. We don’t need to use any of those here but
check the manpage for your version to learn more.
RedHat 7.3 included yet another variant with an interesting
feature. That version would interpret a domain name not only in the
literal way it was written but also as a prefix on other domains. In
this form it would search and return all hostnames that matched the
supplied string. This behavior led certain miscreants to create
hostnames that are very rude about our friends at Microsoft and that
are only revealed through whois
.
If you have access to this particular version and are not easily offended by bad language, then try the following simple query. It returns a large number of matching hostnames, of which a few of the tamer ones are shown.
% whois microsoft.com
[...]
Microsoft.com.fills.me.with.belligerence.net
Microsoft.com.zzz.is.owned.and.haxored.by.sub7.net
Microsoft.com.should.give.up.because.linuxisgod.com
This and other, more useful, features have been disappearing from both domain and DNS lookup tools over the past few years. The main motivation has been security, as certain features were felt to reveal a bit too much about networks. In the past you could find out all the domains owned by an individual and all the DNS records for a given domain. Sadly those days are gone.
Many of the domains that are associated with Internet fraud contain false contact information. ICANN and the registries make all the right noises about ensuring this information is correct, but they seem unable or unwilling to control the problem. So we just have to live with bad data—which is not to say that domain records are useless. Let’s look at an example of a bogus record and see what can be salvaged from it.
% whois mycitibank.org
[whois.publicinterestregistry.net]
[...]
Domain ID:D104488069-LROR
Domain Name:MYCITIBANK.ORG
Created On:02-Jun-2004 18:53:15 UTC
Expiration Date:02-Jun-2005 18:53:15 UTC
Sponsoring Registrar:R51-LROR
Status:TRANSFER PROHIBITED
Registrant ID:P-BTP31-449435
Registrant Name:Benjamin A Perowsky
Registrant Organization:Benjamin A Perowsky
Registrant Street1:173 Dean St.#3
Registrant City:Brooklyn
Registrant Postal Code:11217
Registrant Country:US
Registrant Phone: <phone number>
Registrant Email:[email protected]
[...]
Tech ID:P-NCT21-63
Tech Name:Hostmaster Hostmaster
Tech Organization:united-domains AG
Tech Street1:Gautinger Strasse 10
Tech City:Starnberg
Tech Postal Code:82319
Tech Country:DE
Tech Phone:<phone number>
Tech Email:[email protected]
Name Server:SERVER1-NS1.UDAGDNS.NET
Name Server:SERVER1-NS2.UDAGDNS.NET
Name Server:SERVER1-NS3.UDAGDNS.NET
This is the record for http://mycitibank.org, used at one time for a phishing site that pretended to be Citibank. It is safe to assume that Mr. Perowsky of Brooklyn, if he exists, did not register this domain. The fact that the email address is in Russia is a clue. That address may be correct. The registry needs a way to communicate with registrants in order to bill them, but this may not do us any good as we can’t tell who really receives the email. The information about the registry is going to be correct as they created this record. The same goes for the creation, expiration dates, and the authoritative DNS servers. These are all useful snippets of information.
Even if we know the contact information is bad, we can use it if we are looking at a number of domains we think might be related. That’s because people tend to be lazy. If you are registering several bogus domains, are you really going to think up different and convincing fake contact information for each of them? We can use similar or identical fake addresses to build links between apparently unconnected domains, as we do in the worked example at the end of this chapter. They serve as a type of fingerprint of the people involved.
We can also use whois
to
look up an IP address. While this may look like the reverse DNS
lookups we used earlier, it is a different function that will turn
out to be very useful.
% whois 208.12.16.5
Sprint SPRINTLINK-BLKS (NET-208-0-0-0-1)
208.0.0.0 - 208.35.255.255
Seanet Corporation FON-34904473604317 (NET-208-12-0-0-1)
208.12.0.0 - 208.12.31.255
# ARIN WHOIS database, last updated 2005-01-06 19:10
# Enter ? for additional hints on searching ARIN's WHOIS database.
Nowhere in the output is there any mention of 208.12.16.5
or http://craic.com, so what’s going on here? These are
the subnets of IP addresses that our address is part of. First off,
our target address is located in the United States, so the database
that answered the query is at ARIN. They are telling us that Seanet
Corporation controls addresses 208.12.0.0
through 208.12.31.255
and that Sprint controls the
even larger network, of which Seanet is a part.
We can reasonably infer that Seanet is my ISP or that my ISP
has its addresses allocated to them by Seanet. That is important
information. If we find the IP address of a site that is up to no
good, we may want to ask their ISP to shut them down. This form of
whois
query can quickly help us
find out who we need to talk to.
As I say, the form of report you get depends on the regional registry that manages that block of IP addresses. Here are examples of addresses in the other three regions. Unimportant text has been edited out for the sake of readability.
Here is what the output of APNIC looks like for an address in its region of control:
% whois 211.144.162.160
[Querying whois.apnic.net]
[...]
inetnum: 211.144.160.0 - 211.144.175.255
netname: LIANFENGMAN
country: CN
descr: CHONGQING LIANFENG COMMUNICATION Co.,Ltd
descr: 18F, BUIDING-A, CITY PLAZA, 39-WUSI ROAD,YUZHONG
DISTRICT, CHONG QING,PRC.
admin-c: DC278-AP
tech-c: ZL153-AP
status: ALLOCATED PORTABLE
changed: [email protected] 20041102
mnt-by: MAINT-CNNIC-AP
source: APNIC
person: DUAN CHUNYAN
nic-hdl: DC278-AP
e-mail: [email protected]
address: 18F, BUIDING-A, CITY PLAZA, 39-WUSI ROAD,
YUZHONG DISTRICT, CHONG QING,PRC.
phone: <phone number>
fax-no: <phone number>
country: CN
changed: [email protected] 20041102
mnt-by: MAINT-CNNIC-AP
source: APNIC
[...]
Here is a query for an address in the United Kingdom that gets handled by the RIPE NIC server, responsible for Europe and the Middle East:
% whois 212.20.227.174
[...]
[whois.ripe.net]
[...]
inetnum: 212.20.227.128 - 212.20.227.255
netname: EDNET-COLO-1
descr: edNET Internet Limited
country: GB
admin-c: NS1518-RIPE
tech-c: RM7978-RIPE
status: ASSIGNED PA
mnt-by: EDNET-RIPE-MNT
changed: [email protected] 20030716
remarks: INFRA-AW
source: RIPE
route: 212.20.224.0/22
descr: edNET UK
origin: AS12703
remarks: removed cross-mnt: EDNET-RIPE-MNT
mnt-by: EDNET-RIPE-MNT
changed: [email protected] 20031119
source: RIPE
[...]
The output here tells of a block of 128 addresses (212.20.227.128-212.20.227.255
) assigned to
EDNET-COLO-1, which is probably a subnet of EDNET used for
collocation of web servers. The line at the start of the second
paragraph (route: 212.20.224.0/22
) tells us this is itself
part of a larger block, also assigned to EDNET with the range
212.20.224.0
- 212.20.255.255
.
Finally, here is the format of report returned by LACNIC for an address in Chile:
% whois 146.83.12.32
[whois.lacnic.net]
[...]
inetnum: 146.83/16
status: assigned
owner: Red Universitaria Nacional
ownerid: CL-RUNA1-LACNIC
responsible: Claudia Inostroza
address: Canada, 239, Providencia
address: 6640806 - Santiago -
country: CL
phone: <phone number>
owner-c: CIM2
tech-c: CIM2
inetrev: 146.83/16
nserver: TERMINUS.REUNA.CL
nsstat: 20050103 AA
nslastaa: 20050103
nserver: NS.REUNA.CL
nsstat: 20050103 AA
nslastaa: 20050103
created: 19910128
changed: 20010222
[...]
In this version, the IP address block is given in the
alternate format we mentioned earlier. 146.83/16
means that the starting address
is 146.83.0.0
with the highest 16
bits fixed and hence the remaining 16 bits being available for
allocation. This translates into the address range of 146.83.0.0
through 146.83.255.255
.
I need to stress, once again, that different versions of
whois
may behave differently. Mac
OS X will query ARIN first regardless of the IP address. If ARIN
says it is out of their range, it uses their referral to go to the
correct registry. You end up with the correct information buried in
reams of irrelevant verbiage. The version that ships with Linux
(RedHat Fedora Core 2) figures out the correct registry without this
intermediate step, probably through a simple lookup table, and
returns its results quickly and cleanly. Bear this in mind if you
want to write scripts that parse whois
output.
You can also access whois
through a variety of web interfaces, in particular at domain
registries. Here are several examples:
Spammers have used domain records as a source of email
addresses for some time now. A standard tactic has been to use a
script to make thousands of requests to web-based whois
clients. These days most of the
sites will either prevent you from making more than a certain number
of requests in a period of time, or they will display an image of a
number on the query form that you will need to type into the form
along with the domain name. That can get tedious, but there are
times when a web-based client comes in handy.
These may not provide the full functionality of the Unix clients. Some will only respond to domain name queries, whereas the clients at the four RIRs, shown in Table 2-1, seem to respond only to IP address queries.
Two web-based clients are worthy of special mention. Netcraft
is a company in the U.K. that tracks various aspects of technology
on the Internet. They have a large database of domain names, web
sites, and ancillary data. Their whois
-like client (http://searchdns.netcraft.com/?host) lets you search
this resource and offers a number of features not available from
standard whois
. In particular you
can search on domain names using substrings and wildcards. A simple
query like craic
will return all
domains that contain that string. This can be very useful when you
want to find sites that might be involved in phishing. Try searching
on PayPal or eBay and see how many domains show up. http://sqlwhois.com provides a similar service with
their client (http://www.sqlwhois.com/en/index.html). Here you have
even more control over your query terms, but their database is
limited to the .com
and .net
registries.
dig
and whois
tell you about specific addresses on
the Internet and who controls them. traceroute
tells you about the
path between two addresses—how to get there from
here. Run on host A, with host B as its target, traceroute
fires off packets that are passed
through a series of intervening gateways or routers as determined by
the Internet protocol and the topology of the Internet.
Normal network transactions, like a request for a web page, do
not report the path they take from A to B. traceroute
, on the other hand, triggers a
response from every router along the way. It does this by utilizing
the IP protocol time to live
field and attempts to elicit an ICMP
TIME_EXCEEDED
response from each machine. If successful, it
captures the IP address of the machine and the time at which the
response was received. It performs a reverse lookup on the IP address
in the hope of getting a hostname. It doesn’t always work as well as
we’d like. Not all machines provide the ICMP
TIME_EXCEEDED
response, and many routers do not have
corresponding hostnames, so its output can be very cryptic at times.
But in many cases it provides a very useful perspective on the network
connectivity of the target host and their ISP.
You can infer a lot from the output of traceroute
on a particular address. It can
provide clues about the type of network the target machine is part of,
it can reveal their ISP, and it may be able to tell you something
about how the ISP is connected to the rest of the Net.
Here is the output of the command run from a machine in Australia (http://looking-glass.uecomm.net.au/) pointed at one of my servers. I have deleted some timing information from each step to improve readability.
traceroute to 208.12.16.5 from looking-glass.uecomm.net.au, 30 hops max, 38 byte packets 1 vl2021.agg1.cit190.uecomm.net.au (203.94.128.105) 2 180.gi1.br1.que31.uecomm.net.au (218.185.31.122) 3 sl-gw1-mel-6-0-0.sprintlink.net (203.222.35.229) 4 sl-bb21-syd-1-0.sprintlink.net (203.222.33.18) 5 sl-bb21-syd-14-1.sprintlink.net (203.222.32.49) 6 sl-bb21-sj-3-2.sprintlink.net (144.232.8.130) 7 sl-bb23-tac-14-0.sprintlink.net (144.232.20.9) 8 sl-bb20-tac-5-0.sprintlink.net (144.232.17.173) 9 144.232.17.54 (144.232.17.54) 10 sl-seane-2-0-0.sprintlink.net (160.81.116.34) 11 fermat.seanet.com (199.181.164.164) 12 208.12.16.1 (208.12.16.1) 13 gateway.craic.com (208.12.16.5)
The first two lines show how the source machine connects to the Internet backbone. The next eight lines show the path taken through the SprintLink backbone to Seattle. The last four tell about the network near to my server. Let’s work back from the last line. The next-to-the-last step (line 12) has only an IP address that is very similar to the target machine. The difference in numbers is so small that it is reasonable to assume they are both on the same network.
Businesses often have a range of IP addresses for their various
publicly accessible server. These typically form a subnet that is
connected to the ISP by way of a router. Simple routers are not
usually given hostnames and are also usually given the first usable IP
address in a subnet. So we can make an educated guess that 208.12.16.1
is a router that controls access
to a small subnet on which the target is located.
Line 11 shows a machine at http://seanet.com. This might well be the ISP that the target connects to. Looking up Seanet on the Web shows it to be based in Seattle. It appears to serve a regional market so it may locate the target machine in the Seattle area.
Line 10 tells us that Seanet connects to the rest of the world via http://sprintlink.net.
By looking at some of the http://sprintlink.net lines and using some creative
reasoning, we can even figure out the path taken between Australia and
Seattle. Those SprintLink routers have hostnames and it looks like the
location of each is embedded in the name. So my guess is that the path
taken was from Melbourne to Sydney (syd
), over to the United States to San Jose
(sj
), up the West Coast to Tacoma
(tac
) and finally to Seattle. Okay,
so maybe the San Jose step is a bit of a stretch, but you get the
idea.
If you are interested in the topology of the network and the
connectivity of an ISP then you can repeat the same analysis using
traceroute
from other locations.
Here is the output from the command run on a server in Vienna, Austria
(http://www.vix.at/cgi-bin/lg.cgi).
Tracing the route to gateway.craic.com (208.12.16.5) 1 vix2.core01.vie01.atlas.cogentco.com (193.203.0.113) 2 p6-0.muc01.atlas.cogentco.com (130.117.1.150) 3 p14-0.core01.fra03.atlas.cogentco.com (130.117.1.198) 4 p12-0.core01.dca01.atlas.cogentco.com (154.54.1.17) 5 p6-0.core01.jfk02.atlas.cogentco.com (66.28.4.82) 6 p15-0.core02.jfk02.atlas.cogentco.com (66.28.4.14) 7 p14-0.core02.ord01.atlas.cogentco.com (66.28.4.86) 8 p12-0.core01.mci01.atlas.cogentco.com (66.28.4.33) 9 p5-0.core01.den01.atlas.cogentco.com (66.28.4.29) 10 p5-0.core01.sea01.atlas.cogentco.com (66.28.4.101) 11 g49.ba01.b001696-0.sea01.atlas.cogentco.com (66.250.9.98) 12 Seanet.demarc.cogentco.com (66.28.31.98) 13 fermat.seanet.com (199.181.164.164) 14 208.12.16.1 15 gateway.craic.com (208.12.16.5)
Here we see that a different backbone has been used to connect
from Europe. The router locations are more cryptic, but I would guess
that jfk
(lines 5 and 6) refers to
New York and den
refers to Denver
(line 9). Line 12 shows the end of the path via http://cogentco.com in Seattle followed by the same
server as before at Seanet. This implies that Seanet has direct
connections to both SprintLink and Cogent. Experimentation with
traceroute
from a number of other
sites may turn up the same or additional connections and can suggest
how large that ISP is.
There are a number of sites out there that are kind enough to
provide web interfaces to traceroute
and several other tools related
to routing and connectivity. These are referred to as “Looking Glass”
servers, since they are typically used to probe your own site. http://geektools.com provides a list of these at http://www.geektools.com/traceroute.php—but not all
those listed are operational. Table 2-2 lists a few
around the world that work at the time of this writing.
The DNS infrastructure of the Internet plays a critical role in resolving host and domain names into IP addresses. A great deal of effort has gone into ensuring that DNS works efficiently and is resilient in the face of server failures, incorrect data, or malicious attempts to disrupt the system. But even with these safeguards in place, the system is still subject to attack.
The potential benefit for someone involved in Internet fraud is huge. If you can change the DNS records for a major bank so that they point to your fake site, then you can potentially capture the account numbers and passwords of anyone who logs into the system. This approach sidesteps the need to send out email messages that try to get users to log in, but it does require a high level of technical sophistication. Two approaches have been used: DNS Poisoning and Pharming .
DNS servers around the Internet keep their tables updated by querying other more authoritative servers. The structure is a hierarchy with the network root servers at its origin. In a DNS poisoning attack, DNS servers are manipulated to fetch updated, incorrect DNS records from a server that has been set up by the attacker. This is a sophisticated type of attack to which modern DNS servers are largely immune. But successful attacks do still take place, usually by exploiting bugs in the server software. In March 2005, the SANS Internet Storm Center reported one such attack in which users were redirected to sites that contained spyware, which was then downloaded to users’ computers. A detailed report on this attack can be found at http://isc.sans.org/presentations/dnspoisoning.php.
Pharming is somewhat of an umbrella term for several different approaches to manipulating DNS records. Rather than going after DNS servers directly, an attacker may try to con a domain registrar into changing the authoritative DNS record for a domain to point to their fake site. Examples of this form of social engineering have included someone simply calling a registrar on the phone and persuading them that they represent the owner of the target domain.
One example of this involved the New York-based Internet service provider Panix. In January 2005, an attacker was able to transfer control of its DNS records to a server in the United Kingdom, with all company email being redirected to a server in Canada. Even though the problem was spotted quickly, the impact on the company and its customers was substantial.
Another form of attack takes advantage of the fact that most operating systems have a local file of hostname-to-IP-address mappings that will be queried before making a remote DNS query. If such a file contains a match, then that address will be used without any further lookups. This has been exploited by a computer virus called the Banker Trojan. In addition to logging user keystrokes, it adds lines to the end of a host file on a Windows system that will redirect users to fake bank sites. Many variants of this trojan have been found.
DNS is fundamental to the operation of the Internet and usually works so well that people take it for granted. Attacks like these are a reminder that all components of the Internet are vulnerable.
Now let’s see how these tools can be used in the real world. This section shows how you can figure out the structure of a sophisticated spam operation. A point that I will stress here and throughout the book is how valuable it can be to have multiple examples of an email or a web site. Even though the details may differ, the similarities between them can be very revealing.
For a while last year I was getting a lot of spam emails that all
had a similar underlying appearance. The products being offered varied,
as did the name of the Sender, but they clearly had a common origin. The
From addresses all had the form
<somebody>@stderr.<somedomain>.com and they
all had the same mechanism for unsubscribing from their mailing list. So
I collected a bunch of messages that fit this pattern and made a list of
the web sites they were directing me to. At first glance these seemed to
be a diverse group but as I added more examples the domain names started
to take on a similar form. That was my motivation to investigate further
and start to run dig
on the
hostnames. Table
2-3 shows a small sample of the results from that survey, sorted
by IP address.
Hostname | IP address |
| |
| |
| |
| |
| |
| |
| |
| |
| |
|
Web sites come and go. The dodgy ones, in particular, often have a very short life. So don’t be surprised if the specific IP addresses and hostnames given here no longer give the same results. Instead, let the examples illustrate the underlying techniques and use them to explore sites that you come across in your own email.
First, look at the hostnames. You can see a common pattern in the domain names with two or three words joined together that almost make sense. Likewise, the first part of each hostname has the form of a name and a number, and there are two groups that are arranged sequentially. Now look at the IP addresses—the pattern is glaringly obvious. The people behind this operation would appear to have a bank of servers covering a significant block of IP addresses. These are organized very logically such that, for example, servers in the http://dynamicrhythms.com block have consecutive IP addresses.
It’s a safe bet that other servers occupy the gaps in the IP
address range. We can even predict some of the hostnames. The next step
was to figure out just how large this network was. I couldn’t get that
information directly, but by calling dig
systematically across a range of
addresses, I thought I might be able to define its limits. Doing this
one address at a time became tedious, so I wrote a small Perl script
that takes a range of numeric addresses and performs a reverse lookup on
each of them. This can be useful in other scenarios, so I’ve included it
here as Example 2-1.
Note that you need to switch between the dotted-quad notation that
dig
expects and the decimal form you
need to step through sequentially.
#!/usr/bin/perl -w # Runs dig on all IP addresses in the specified range die "Usage: $0 <start IP addr> <end IP addr> " unless @ARGV == 2; my $start_dec = dotted_quad_to_decimal($ARGV[0]); my $end_dec = dotted_quad_to_decimal($ARGV[1]); for(my $i=$start_dec; $i<=$end_dec; $i++) { my $i_ip = decimal_to_dotted_quad($i); my $hostname = `dig +short -x $i_ip`; printf "%-15s %s", $i_ip, $hostname; } sub dotted_quad_to_decimal { my @fields = split /./, shift; (fields[0] * 16777216) + ($fields[1] * 65536) + ($fields[2] * 256) + $fields[3]; } sub decimal_to_dotted_quad { my $decimal = shift; my $factor = 16777216; my @quad = (); for(my $i=0; $i<4; $i++) { $quad[$i] = int($decimal / $factor); $decimal -= $quad[$i] * $factor; $factor /= 256; } join ".", @quad; }
Running this over the 66.111.233.x
and 66.111.234.x
blocks (of 256 addresses each)
uncovered 211 hostnames similar to those above, which fell into 60
groups of related names. I didn’t bother to scan adjacent blocks, but I
know from other sources on the Web that the network extends even further
than this. Here is a sample of the scan output:
66.111.233.168 233-111-66.ftl-nj.webhostplus.com. 66.111.233.169 233-111-66.ftl-nj.webhostplus.com. 66.111.233.170 dyna1.dynamicrhythms.com. 66.111.233.171 dyna2.dynamicrhythms.com. 66.111.233.172 dyna3.dynamicrhythms.com. 66.111.233.173 dyna4.dynamicrhythms.com. 66.111.233.174 dyna5.dynamicrhythms.com. 66.111.233.175 spec1.greenplanetspecials.com. 66.111.233.176 spec2.greenplanetspecials.com.
One other thing to note from these scans was the mapping of a
significant number of the IP addresses in the 66.111.233.x
block to a single host called
http://233-111-66.ftl-nj.webhostplus.com and to
http://234-11-66.ftl-nj.webhostplus.com in the
other block. We’ll return to this shortly.
So far we’ve used dig
for
reverse lookups. Using it with the reported hostnames would not be
expected to add much information in this case. In fact, a sampling of
such queries as I write this, some months after that period of spam,
shows that many do not return IP addresses. That tells me that not only
have these sites been taken down but also that the DNS entries have been
removed. Fortunately for us, someone slipped up and left the reverse
entries in the tables. The management of DNS records can be surprisingly
sloppy and still work just fine. Sometime that works to your
advantage.
Now let’s see what whois
can
contribute to this story. Running it on a sample of the domain names
turns up a mixed bag of names and addresses in the contact information.
Most of the domains appear linked to three addresses in the towns of
Sunny Isles Beach, Aventura, and Hollywood, which are all in Florida. I
don’t know if these are real addresses or not, but they serve as a type
of signature or fingerprint for the people behind these sites. We’ll
talk more about making these kinds of connections later in the
book.
Note that you should NOT write scripts that attempt to step
through whois
records the way I did
with the DNS lookups. This is exactly how spammers have built up their
mailing lists in the past, and the domain registries will likely
detect your script and block any further whois
queries coming from your computer.
Modest numbers of queries submitted manually should not get you into
trouble.
Using whois
with any of the IP
addresses revealed something about the network these servers reside
in:
[whois.arin.net] OrgName: WebHostPlus Inc OrgID: WEBHO-3 Address: 100 Plaza drive City: Secaucus StateProv: NJ PostalCode: 07094 Country: US NetRange: 66.111.192.0 - 66.111.255.255 CIDR: 66.111.192.0/18 NetName: WEBHOSTPLUS-INC NetHandle: NET-66-111-192-0-1 Parent: NET-66-0-0-0-0 NetType: Direct Allocation NameServer: NS.WHP-SERVER.COM
WebHost Plus is a well-established company in New Jersey that provides web hosting and other services to a large number of clients. Our friends sending out the emails are simply using them to host their web sites. But with over 200 web sites, each with a unique IP address, this looks like a big operation. Are they really running that many different web servers and physical computers?
No, what they are doing is configuring their servers with multiple
IP addresses. Even with a single Ethernet card, you can configure Linux,
for example, to act as though it has 256 IP addresses. Then you
configure the Apache web server to respond to each address with a different web site.
That’s what was going on with the 66.111.233.x
addresses handled by one machine
(http://233-111-66.ftl-nj.webhostplus.com) and the
66.11.234.x
block handled by another.
In their DNS tables, all the addresses were mapped to the canonical
names of those machines until they were allocated to a client’s site.
This is how companies such as WebHost Plus can afford to offer web sites
for just a few dollars a month. You are sharing the server with other
people and, as long as no one site hogs all the CPU cycles, it will
appear as though you have your own dedicated server.
It seems like our friends are giving themselves a lot of extra work creating and managing all these distinct web sites. Why go to all that trouble? It’s all an attempt to evade the spam filters that are becoming ever more sophisticated. By generating emails with continually evolving content and including links to web sites with different hostnames they can avoid—or at least delay—being detected by the spam filters and being blacklisted by mail relays. They can run one web site for a week or two, shut it down, and then reappear under a totally different name.
This example has shown how much can be learned about an operation
simply using dig
and whois
. By looking at similar emails, I found a
set of hostnames that resembled each other. dig
revealed that these all had similar IP
addresses. Reverse lookups across a wider range of addresses turned up a
lot more domains and hostnames, and whois
showed that the same company hosted all
of these. Unallocated addresses from the reverse lookup scan suggested
that two physical servers were being used to host all these web sites.
Running whois
on the domain names
turned up a confused mass of contact information that, in isolation, was
not that useful. But even untrustworthy contact information can be
useful as a signature or fingerprint for this operation.
18.188.218.184