Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 7. Proxies, Firewalls, and Routers

	"Every link in the chain has its own role to play."
	--Samir Datt

Just as every link in a chain has its own role to play, every component in the network has a role to play and evidence to contribute to our investigation. In this chapter, we will exclusively focus on understanding web proxies, firewalls, and routers; reasons to investigate them; and how this would help in taking the investigation forward.

In this chapter, we will cover the following topics:

Getting proxies to confess
Making firewalls talk
Tales routers tell

Getting proxies to confess

Proxies are a very important component of any network. A proxy acts as an intermediary between other computers of the network and the Internet. In simple terms, this means that all the traffic entering or leaving the network should pass through the proxy server. Looking back at our previous chapter, we recall that logs can be a forensic investigator's best friend. Proxy servers can generate such logs that we can use for our investigations.

Roles proxies play

Proxy servers are usually deployed with a number of end objectives in mind. They can be used for the following:

Sharing a network connection on a local area network: Here, multiple users can share a single Internet connection.
Speeding up web access: This shared Internet connection that is accessed via a proxy allows the proxy to cache regularly demanded pages such as Google. This enables the server to immediately deliver a page that is in the cache, speeding up the web access.
Reducing band width usage: This local caching enables the server to reduce redundant or duplicate requests being sent out, reducing bandwidth usage and requirement.
Anonymity: Proxy servers enable a user to hide their IP address behind that of the proxy. This also assists in preventing unexpected/direct access to user machines by machines that are external to the network.
Implementing Internet access control: Proxies can also be used to authenticate users for Internet access or prevent users from accessing certain websites, to control usage during particular times, and as a web or content filter.
Bypassing security and restrictions: A number of countries have Internet censorship in place that affects access to certain sites. A proxy server can help bypass these restrictions by making it seem that the connection originates from an altogether different IP address or location.

Types of proxies

Proxy servers come in all shapes and sizes. While some specialize in anonymizing Internet access, others focus on caching traffic to optimize the usage of Internet resources. To better our understanding of the proxy universe, let's take a quick look at the different types of proxies, as follows:

Anonymizing Proxy: An anonymizing proxy is usually a web proxy that generally sits between the user and the Internet and works towards hiding or anonymizing the IP address and location of the Internet user. This is extremely useful for government users or corporates that do not wish to expose their information to malicious parties. This is quite useful for secure and anonymized communications as well:
Highly Anonymizing Proxy: As the name suggests, this proxy takes anonymizing to the next level. Not only does it hide the IP of the user, it also hides the fact that there is a proxy between the end user and the Internet. Unlike the normal proxy, the destination in this case does not know that it is being accessed via a proxy. To the destination, it seems as if it is being accessed by the client directly. The REMOTE_ADDR header contains the IP address of the proxy itself, which leads the destination to believe that the proxy is the actual client accessing it, as shown in the following image:
Transparent Proxy: A transparent proxy is a proxy server and gateway rolled into one device. This is also known as an intercepting proxy. All the connections made by end users via the gateway are seamlessly redirected via the proxy. This does not require any configuration on the client side. As the name suggests, this forwards the request to the destination in a transparent manner, without hiding the client's IP address. Transparent proxies can be easily detected by examining the server-side HTTP headers, as shown in the following image:
Distorting Proxy: A proxy of this type correctly identifies itself as a proxy server; however, it intentionally misdirects by posting an incorrect IP address through the HTTP headers, as shown in the following image:
Reverse Proxy: Reverse proxies are fairly common proxy servers that are positioned between the Internet and a firewall that is connected to the private networks. Requests originating from the Internet are received by the reverse proxy and directed to the network firewall. This prevents the Internet clients from obtaining unmonitored and direct access to restricted information stored on the network servers. When caching is enabled in a reverse proxy, the network traffic can be considerably reduced by providing the previously cached information to the network users without directing every request to the Internet, as shown in the following image:

Understanding proxies

The evidence from proxies is usually in the cache and logs. If you recall, in the previous chapter, we spent a considerable amount of time understanding the logs, logging, and log management concepts. In this section, we will take a look at the evidence that we can dig out of them.

Before we begin, let's get a little familiar with some common proxy names that are available out there.

A few of the popular proxies include Squid, NetCache, ISA, BlueCoat, and so on. Proxies are available in both open source and paid varieties.

Comprehensive and voluminous books have been written about web proxies; however, as our plan is to focus on understanding their role and how to use them with our 007 hat on, we will select one and work at deepening our understanding of how it works and the kind of evidence we can get out of it.

For the purpose of this lesson, we will work with Squid. Squid is a very popular and versatile open source proxy that enjoys widespread usage worldwide. Squid is made available under the terms of the GNU General Public License. It is extremely flexible and customizable and works in both forward and reverse proxy scenarios.

Squid works by caching web objects of many different kinds. This can include frequently accessed webpages, media files, and so on, including those accessed through HTTP as well as FTP. This reduces the response time and bandwidth congestion.

A Squid proxy server is a separate server and works by tracking the use of an object over the network. At the time of the first request, Squid will play the role of an intermediary and will pass on the client's request to the server and in reverse pass, on the server's response to the client, while also saving a local copy of the requested object. However, for every subsequent request of the same object, it will serve the object from its local cache. In large organizations, this is the reason that system updates and patching takes a lot less time and bandwidth even when updating hundreds of machines at the same time.

The following graphic gives a pictorial representation of the Squid web proxy server in action. As we can see, it sits between the users and the router and plays a number of roles:

An amazing side effect of this caching behavior is the availability of (unexpired) items of evidential interest in the cache that Squid (or any proxy server, for that matter) has secreted away to improve the network performance. For us, items such as these can really help us in presenting the smoking gun.

In a case relating to sexual harassment at the workplace, an employee was identified to be downloading sexually-explicit material and sending it to a female employee using a colleague's e-mail ID. The idea behind this was to implicate that the colleague and harass the other employee. While the suspect's logs showed access to strangely (though seemingly innocent) named files hosted on the servers accessed through IP addresses (unresolved DNS), the cache had the actual corresponding content that proved that the suspect was the real culprit behind both the crimes.

In its role as a regular forward proxy, Squid provides the following functionalities:

Caching
Authentication and authorization
Content filtering or Internet policy enforcement
Network resource usage management

In the role of a reverse proxy, Squid can perform following functions:

Sit in front of a server farm caching and serving static content
Optimize the web server's usage while serving data to the clients
Enhance security by web-filtering the content
Act as an IPv4 - IPv6 gateway

Installing Squid is quite straightforward. The installation process involves three steps and though there are just three steps, they may slightly vary with different flavors of Linux:

The following screenshot shows its installation in Ubuntu:

We start by running the install command as superuser. At this point, our Ubuntu box asks for a password. Once this is provided, it goes ahead and checks whether Squid is already installed. If yes, then it checks to see whether it is the latest version, if not, it will upgrade it. In my case, as previously shown, it has done everything and found that I already have the latest version installed. This shows that we are good to go to the second stage.

The next stage is the modification of our squid.conf configuration file.

This file for Ubuntu is found under the following path:

etc/squid3/squid.conf

In other flavors of Linux, it can be found at the following:

etc/squid/squid.conf

In the previous chapter, you may recall that we had spent time modifying the splunk.conf file in order to be able to run Splunk effectively. We need to do the same here for Squid. To edit the squid.conf file, we open it in our favorite text editor and make the necessary changes, as follows:

sam@fgwkhorse: -$ sudo vim /etc/squid3/squid.conf

While there are a large number of changes that can be made to the squid.conf file to tweak Squid to run exactly as per our needs, a lot of these options are out of the scope of this chapter. Suffice to say, our idea is to get a feel of the topic and then go on to the investigative aspects.

By default, most of the settings in the configuration file do not need to be changed. Theoretically, Squid can be run with a completely blank configuration file. In most cases, if we start with the default squid.conf file (which we usually do), at least one part will have to be changed for sure. By default, the squid.conf blocks access to all browsers. We need to change this otherwise Squid will cut us off from the Internet.

The first thing to do in the configuration file is to set the HTTP port(s) on which Squid will listen for incoming requests. By default, this is set to 3128.

As we are aware, network services listen at particular ports for requests directed at them. Only system administrators have the right to use ports under 1024. These are used by programs that provide services such as POP, SMTP, HTTP, DNS, and so on. Port numbers that are greater than 1024 are considered as belonging to non-admin untrusted services as well as transient connection requests such as those related to outgoing data.

The Hypertext Transfer Protocol (HTTP) typically uses port 80 for listening for incoming web requests. A lot of ISPs use port 8080 as a sort of pseudo-standard for the HTTP traffic.

As you learned a bit earlier, Squid's default HTTP caching port is 3128. If we wish to add the 8080 port to our configuration, one of the ways to do it is to add it in the configuration file, as follows:

http_port 3128 8080

Another aspect to consider is the storage of cached data. As we have studied, one of the main roles of a web proxy is to cache the data to speed up the access and reduce the bandwidth usage. All this data that has to be cached must be stored, therefore, there exists a need for proper high-speed storage to be available for the proxy server. Depending on the throughput requirements, the hardware available to Squid (or any other proxy server for that matter) can make or mar an installation.

As part of the configuration process, we need to guide Squid by providing it information relating to the directories where it needs to store the cached data. This is done with the cache_dir operator. As storage requirements may vary and we may need to specify more than one directory for the cached data, Squid allows us multiple use of the cache_dir operator.

Let's look at the default values for the cache_dir operator in the standard squid.conf configuration file, as follows:

cache_dir ufs /usr/local/squid/var/cache/ 100 16 256

Let's take a quick look at what this means.

The line begins with the cache_dir operator. This allows Squid to know the path and name of the directory where the cache will be stored. This information is also useful for us as investigators. The way Squid structures this directory is to create another layer of sub-directories and then another to enable efficient storage and retrieval without sacrificing the speed. This information is reflected in the line that follows the following format:

cache_dir storageformat Directory-Name Mbytes L1 L2 [options]

Let's compare the two command lines.

We can see that UFS is the storage format, followed by the complete path and name of the storage directory, and then followed by Mbytes that is the amount of drive space in Megabytes to use under this directory. By default, this is 100 MB. It is usually recommended to change this to suit our specific requirements. It is not unheard of to add another zero to make the storage at least a Gigabyte.

The L1 denotes the number of level one or first level sub-directories that Squid will create under the Directory specified earlier. By default, this is 16.

The L2 is the number of level two or second-level sub-directories that will be created under each of the previously mentioned first-level directories. In this case, the default is 256.

The next thing to ensure is logging. We need to order Squid to log every request to the cache. This is done by ensuring the existence of the following line in the configuration file:

cache_access_log /var/log/squid/access.log

All requests to the proxy server will be logged as per the path and filename specified earlier. Again, these logs are very important to us from the perspective of network forensics.

Before we move on from the configuration setting in the squid.conf file, it is very important to touch upon network access control. This is handled by ACLs.

One of the issues that the Squid proxy server is required to handle is restricting access to any IPs that are not on the network. This is to prevent some happy traveler from a nearby network dropping in to take advantage of our open hospitality. The simplest way to do this is to only allow the IP addresses that are part of your network.

This is best illustrated with the example shown in the following:

acl localnet src 192.168.1.0/255.255.255.0
http_access allow localnet

By now, we should have a fairly clear idea of managing the Squid configuration file.

Let's move on to the third and the final step of starting the Squid server to enable and activate the configurations that we have done, as follows:

service squid start

That's all it takes! Now, we have the Squid proxy server up and running with the configuration that we set up for it.

We can verify its status by typing the following command:

service squid status

That's it! We can now move on to identifying and examining the evidence that proxy servers generate.

Excavating the evidence

As we saw in the earlier section, evidence exists in the cache directory and logs of the proxy server.

Some of the regular uses of a web proxy such as Squid include security, efficiency, compliance, user auditing, and monitoring. All this information is largely determined by the data present in the logs.

Logs allow us to see the following:

User-specific web activities
Application-specific web activities
Malware activities that use HTTP

Let's look at the typical structure of logs generated by the proxy servers.

The access.log file basically has two possible formats depending on the configurations. The first is the default or native log file format and the second is the Common LogFile Format. In the last chapter, we had examined the Common Logfile Format to some level of detail, therefore, in this chapter, let's focus on Squid's native log file format. Each line represents an event or request to the server. The format of this is as follows:

time elapsed remotehost code/status bytes method URL rfc931 hierarchy_code type

In this preceding line, every field has information that is of interest. Let's look at each of them one by one in the following:

Data Field	Type	Description
Time	float	This is the system time in the Unix format. Basically, it is a listing of every second elapsed since 1970. There are a number of tools and perl scripts available to convert this into something that is readable by humans.
Elapsed	integer	This is the time elapsed/duration in milliseconds to complete the transaction.
Remote host	string	This is the client address or IP address of the requesting browser.
Result Code	string	The Result Code generated by Squid is composed of a number of tags separated by the underscore character. This usually begins with any one of the following: TCP, UDP, or NONE to denote the mode by which the result was delivered. The next tag is an optional one and is usually separated from this tag by an underscore. An example of this would be the following: TCP_HIT: This means that the result was delivered using TCP (HTTP port 3128) and the HIT denotes that it was delivered from cache. TCP_MISS: This actually denotes that the result that was delivered was a network response. The Squid website (www.squid-cache.org) has an extensive documentation on the different Result Codes and tags comprising it.
Status Code	integer	The Result Code is followed by a slash and then by the Status Code. These HTTP status codes are mostly taken from RFC 1945, 2518, 2616, and 4918. Some of these codes are informational in nature, such as 101 means Switching Protocols and 102 means Processing, while others are related to the transaction details and errors. For example, a status code of 200 would signify OK (a successful transaction), while a 401 code would signify an Unauthorized request.
Bytes	integer	Bytes obviously signify the size of the data delivered to the client in Bytes. Error reports, headers, and object data all count towards this total.
Method	string	This is the HTTP request method used to obtain the object. These include GET, HEAD, PUT, POST, and so on. Again, detailed documentation is available on the Squid site.
URL	string	This contains the complete URL requested by the client.
rfc931	string	This may contain the user's identity for the requestor. This information is usually obtained from either HTTP/TLS authentication, IDENT lookup, or external ACL helper. If no identity can be found, then this field will contain "-".
Hierarchy_code	string	TIMEOUT: This is a prefix that gets tagged if a timeout occurs while waiting for the ICP replies from the neighbors. HIERARCHY CODE: This code explains how the request was handled by the server. For instance, a response of DIRECT shows that the data was fetched direct from the main origin server, while a response of NONE means no hierarchy information, TCP failures, and so on. Comprehensive documentation is available on the squid-cache.org site. IP ADDRESS/HOSTNAME: This could belong to an Origin server, a parent of a peer caching server.
Type	string	This is the type of object requested, usually a recognizable MIME type; however, some objects have no type and are represented by a `-`.

Another point to note is that in the event the debugging option of log_mime_headers is enabled, all the HTTP request and reply headers are logged and there may be two additional columns in the access log due to this.

Now that we have a good idea of the structure of web proxy logs, we need to figure out how to use them in our investigation.

Some of the common investigations they are used for are as follows:

Policy violations relating to the Internet access
Monitoring user activity
Tracking the spread of malware and spyware infections in our network
Intruder attacks on clients and servers—both internal and external
Detection of IP theft and leakage of information

Investigating proxy server logs can be a lot of fun. Just scanning about aimlessly through multiple logs can be quite diverting, but apart from enhancing the level of your understanding, it may not yield quick results.

Therefore, how do we go about looking for evidence on web proxies?

Let's begin with a particular scenario. We have an unknown user on our network, who is uploading our company's secret files (appropriately called secrets.pdf) outside of our network.

As we are aware that HTTP methods are used to determine whether files are being downloaded or uploaded. Whenever we send a request for a webpage or graphic, the HTTP method used is GET. Similarly, every time an upload occurs (for example, an e-mail is sent with an attachment), the HTTP method used is POST.

Just looking at all log entries, where the HTTP method = POST, will give us an insight into all the data leaving the network. Similarly, filtering content type = application/pdf in addition to the HTTP method will give us a list of the log entries where the PDF files have been uploaded out of our network.

Additional things to look for when investigating such activities include the following:

Uploads to IP addresses that are not resolvable using DNS lookups
Uploads to online storage sites such as Dropbox, iCloud, and Skydrive or hosts such as Hostgator
Uploads happening via ports not equal to 80 (for example, upload to 10.10.1.7:31333)
Upload with the confidential filename in the log file entry itself

Similarly, a drive by download infection would be identified by a GET and when we look at the logs, we will find that executable files have been downloaded.

Another way is to identify the violation of company acceptable use policy by user downloads of video or music files is by looking at the downloaded file sizes and file types.

As we have seen that once we know what we are looking for, it becomes very easy to get proxies to confess.

Let's now move on to understanding and examining firewalls and the ways in which they can contribute to our forensic examinations.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 7. Proxies, Firewalls, and Routers

Create new playlist

Sign In