"Every link in the chain has its own role to play." | ||
--Samir Datt |
Just as every link in a chain has its own role to play, every component in the network has a role to play and evidence to contribute to our investigation. In this chapter, we will exclusively focus on understanding web proxies, firewalls, and routers; reasons to investigate them; and how this would help in taking the investigation forward.
In this chapter, we will cover the following topics:
Proxies are a very important component of any network. A proxy acts as an intermediary between other computers of the network and the Internet. In simple terms, this means that all the traffic entering or leaving the network should pass through the proxy server. Looking back at our previous chapter, we recall that logs can be a forensic investigator's best friend. Proxy servers can generate such logs that we can use for our investigations.
Proxy servers are usually deployed with a number of end objectives in mind. They can be used for the following:
Proxy servers come in all shapes and sizes. While some specialize in anonymizing Internet access, others focus on caching traffic to optimize the usage of Internet resources. To better our understanding of the proxy universe, let's take a quick look at the different types of proxies, as follows:
The evidence from proxies is usually in the cache and logs. If you recall, in the previous chapter, we spent a considerable amount of time understanding the logs, logging, and log management concepts. In this section, we will take a look at the evidence that we can dig out of them.
Before we begin, let's get a little familiar with some common proxy names that are available out there.
A few of the popular proxies include Squid, NetCache, ISA, BlueCoat, and so on. Proxies are available in both open source and paid varieties.
Comprehensive and voluminous books have been written about web proxies; however, as our plan is to focus on understanding their role and how to use them with our 007 hat on, we will select one and work at deepening our understanding of how it works and the kind of evidence we can get out of it.
For the purpose of this lesson, we will work with Squid. Squid is a very popular and versatile open source proxy that enjoys widespread usage worldwide. Squid is made available under the terms of the GNU General Public License. It is extremely flexible and customizable and works in both forward and reverse proxy scenarios.
Squid works by caching web objects of many different kinds. This can include frequently accessed webpages, media files, and so on, including those accessed through HTTP as well as FTP. This reduces the response time and bandwidth congestion.
A Squid proxy server is a separate server and works by tracking the use of an object over the network. At the time of the first request, Squid will play the role of an intermediary and will pass on the client's request to the server and in reverse pass, on the server's response to the client, while also saving a local copy of the requested object. However, for every subsequent request of the same object, it will serve the object from its local cache. In large organizations, this is the reason that system updates and patching takes a lot less time and bandwidth even when updating hundreds of machines at the same time.
The following graphic gives a pictorial representation of the Squid web proxy server in action. As we can see, it sits between the users and the router and plays a number of roles:
An amazing side effect of this caching behavior is the availability of (unexpired) items of evidential interest in the cache that Squid (or any proxy server, for that matter) has secreted away to improve the network performance. For us, items such as these can really help us in presenting the smoking gun.
In a case relating to sexual harassment at the workplace, an employee was identified to be downloading sexually-explicit material and sending it to a female employee using a colleague's e-mail ID. The idea behind this was to implicate that the colleague and harass the other employee. While the suspect's logs showed access to strangely (though seemingly innocent) named files hosted on the servers accessed through IP addresses (unresolved DNS), the cache had the actual corresponding content that proved that the suspect was the real culprit behind both the crimes.
In its role as a regular forward proxy, Squid provides the following functionalities:
In the role of a reverse proxy, Squid can perform following functions:
Installing Squid is quite straightforward. The installation process involves three steps and though there are just three steps, they may slightly vary with different flavors of Linux:
The following screenshot shows its installation in Ubuntu:
We start by running the install command as superuser. At this point, our Ubuntu box asks for a password. Once this is provided, it goes ahead and checks whether Squid is already installed. If yes, then it checks to see whether it is the latest version, if not, it will upgrade it. In my case, as previously shown, it has done everything and found that I already have the latest version installed. This shows that we are good to go to the second stage.
The next stage is the modification of our squid.conf
configuration file.
This file for Ubuntu is found under the following path:
etc/squid3/squid.conf
In other flavors of Linux, it can be found at the following:
etc/squid/squid.conf
In the previous chapter, you may recall that we had spent time modifying the splunk.conf
file in order to be able to run Splunk effectively. We need to do the same here for Squid. To edit the squid.conf
file, we open it in our favorite text editor and make the necessary changes, as follows:
sam@fgwkhorse: -$ sudo vim /etc/squid3/squid.conf
While there are a large number of changes that can be made to the squid.conf
file to tweak Squid to run exactly as per our needs, a lot of these options are out of the scope of this chapter. Suffice to say, our idea is to get a feel of the topic and then go on to the investigative aspects.
By default, most of the settings in the configuration file do not need to be changed. Theoretically, Squid can be run with a completely blank configuration file. In most cases, if we start with the default squid.conf
file (which we usually do), at least one part will have to be changed for sure. By default, the squid.conf
blocks access to all browsers. We need to change this otherwise Squid will cut us off from the Internet.
The first thing to do in the configuration file is to set the HTTP
port(s) on which Squid will listen for incoming requests. By default, this is set to 3128
.
As we are aware, network services listen at particular ports for requests directed at them. Only system administrators have the right to use ports under 1024
. These are used by programs that provide services such as POP, SMTP, HTTP, DNS, and so on. Port numbers that are greater than 1024
are considered as belonging to non-admin untrusted services as well as transient connection requests such as those related to outgoing data.
The Hypertext
Transfer Protocol (HTTP) typically uses port 80
for listening for incoming web requests. A lot of ISPs use port 8080
as a sort of pseudo-standard for the HTTP traffic.
As you learned a bit earlier, Squid's default HTTP caching port is 3128. If we wish to add the 8080 port to our configuration, one of the ways to do it is to add it in the configuration file, as follows:
http_port 3128 8080
Another aspect to consider is the storage of cached data. As we have studied, one of the main roles of a web proxy is to cache the data to speed up the access and reduce the bandwidth usage. All this data that has to be cached must be stored, therefore, there exists a need for proper high-speed storage to be available for the proxy server. Depending on the throughput requirements, the hardware available to Squid (or any other proxy server for that matter) can make or mar an installation.
As part of the configuration process, we need to guide Squid by providing it information relating to the directories where it needs to store the cached data. This is done with the cache_dir
operator. As storage requirements may vary and we may need to specify more than one directory for the cached data, Squid allows us multiple use of the cache_dir
operator.
Let's look at the default values for the cache_dir
operator in the standard squid.conf
configuration file, as follows:
cache_dir ufs /usr/local/squid/var/cache/ 100 16 256
Let's take a quick look at what this means.
The line begins with the cache_dir
operator. This allows Squid to know the path and name of the directory where the cache will be stored. This information is also useful for us as investigators. The way Squid structures this directory is to create another layer of sub-directories and then another to enable efficient storage and retrieval without sacrificing the speed. This information is reflected in the line that follows the following format:
cache_dir storageformat Directory-Name Mbytes L1 L2 [options]
Let's compare the two command lines.
We can see that UFS is the storage format, followed by the complete path and name of the storage directory, and then followed by Mbytes
that is the amount of drive space in Megabytes to use under this directory. By default, this is 100
MB. It is usually recommended to change this to suit our specific requirements. It is not unheard of to add another zero to make the storage at least a Gigabyte.
The L1
denotes the number of level one or first level sub-directories that Squid will create under the Directory
specified earlier. By default, this is 16
.
The L2
is the number of level two or second-level sub-directories that will be created under each of the previously mentioned first-level directories. In this case, the default is 256
.
The next thing to ensure is logging. We need to order Squid to log every request to the cache. This is done by ensuring the existence of the following line in the configuration file:
cache_access_log /var/log/squid/access.log
All requests to the proxy server will be logged as per the path and filename specified earlier. Again, these logs are very important to us from the perspective of network forensics.
Before we move on from the configuration setting in the squid.conf
file, it is very important to touch upon network access control. This is handled by ACLs.
One of the issues that the Squid proxy server is required to handle is restricting access to any IPs that are not on the network. This is to prevent some happy traveler from a nearby network dropping in to take advantage of our open hospitality. The simplest way to do this is to only allow the IP addresses that are part of your network.
This is best illustrated with the example shown in the following:
acl localnet src 192.168.1.0/255.255.255.0 http_access allow localnet
By now, we should have a fairly clear idea of managing the Squid configuration file.
Let's move on to the third and the final step of starting the Squid server to enable and activate the configurations that we have done, as follows:
service squid start
That's all it takes! Now, we have the Squid proxy server up and running with the configuration that we set up for it.
We can verify its status by typing the following command:
service squid status
That's it! We can now move on to identifying and examining the evidence that proxy servers generate.
As we saw in the earlier section, evidence exists in the cache directory and logs of the proxy server.
Some of the regular uses of a web proxy such as Squid include security, efficiency, compliance, user auditing, and monitoring. All this information is largely determined by the data present in the logs.
Logs allow us to see the following:
Let's look at the typical structure of logs generated by the proxy servers.
The access.log
file basically has two possible formats depending on the configurations. The first is the default or native log file format and the second is the Common LogFile Format. In the last chapter, we had examined the Common Logfile Format to some level of detail, therefore, in this chapter, let's focus on Squid's native log file format. Each line represents an event or request to the server. The format of this is as follows:
time elapsed remotehost code/status bytes method URL rfc931 hierarchy_code type
In this preceding line, every field has information that is of interest. Let's look at each of them one by one in the following:
Another point to note is that in the event the debugging option of log_mime_headers is enabled, all the HTTP request and reply headers are logged and there may be two additional columns in the access log due to this.
Now that we have a good idea of the structure of web proxy logs, we need to figure out how to use them in our investigation.
Some of the common investigations they are used for are as follows:
Investigating proxy server logs can be a lot of fun. Just scanning about aimlessly through multiple logs can be quite diverting, but apart from enhancing the level of your understanding, it may not yield quick results.
Therefore, how do we go about looking for evidence on web proxies?
Let's begin with a particular scenario. We have an unknown user on our network, who is uploading our company's secret files (appropriately called secrets.pdf
) outside of our network.
As we are aware that HTTP methods are used to determine whether files are being downloaded or uploaded. Whenever we send a request for a webpage or graphic, the HTTP method used is GET. Similarly, every time an upload occurs (for example, an e-mail is sent with an attachment), the HTTP method used is POST.
Just looking at all log entries, where the HTTP method = POST
, will give us an insight into all the data leaving the network. Similarly, filtering content type = application/pdf
in addition to the HTTP method will give us a list of the log entries where the PDF files have been uploaded out of our network.
Additional things to look for when investigating such activities include the following:
10.10.1.7:31333
)Similarly, a drive by download infection would be identified by a GET and when we look at the logs, we will find that executable files have been downloaded.
Another way is to identify the violation of company acceptable use policy by user downloads of video or music files is by looking at the downloaded file sizes and file types.
As we have seen that once we know what we are looking for, it becomes very easy to get proxies to confess.
Let's now move on to understanding and examining firewalls and the ways in which they can contribute to our forensic examinations.
18.188.154.252