9.1. Determining What to Collect

Before starting on a project like this, you need to identify what purpose gathering the statistics will serve. Why they're needed will affect what information needs to be collected. For example, a report generated for a company's marketing department might show the most popular pages for visitors to enter or exit the site on. A web development team might be more interested in seeing a report showing what browsers the visitors are using.

After you've identified what will be included in the report, you then check if that information is something that can easily be retrieved or calculated. Sometimes it can be extracted right from an environment variable or the HTTP request but other times you will have to extrapolate the information. Using the entrance and exit pages as an example again, page requests aren't explicitly marked as an entrance or exit — but if you collect a list of pages a user visits and sort them in chronological order, then obviously the first is the entry and the last is the exit.

HTTP is a stateless protocol so there is no real foolproof way to identify users and track their session. You can track the IP address, but a visitor could be behind an anonymizing proxy server, which presents a new address for each request, or multiple users may be behind a gateway and all share one publically visible IP address. The page may even be retrieved from a proxy's cache in which case the request would never hit your server to be tallied. Even cookies and sessions can be manipulated to skew tracking results. It is important for you and those reading your reports to keep in mind that only general trends can be presented. There will always be some margin of error.

So what's available? First check PHP's $_SERVER super global array (http://us.php.net/manual/en/reserved.variables.phphttp). PHP_SELF, REQUEST_URI, REQUEST_TIME, HTTP_USER_AGENT, and REMOTE_ADDR may be helpful. You can also use JavaScript to determine other values such as the client's screen resolution and send it back to your server.

The raw data for the report in this chapter will be the users' IP addresses, what pages they viewed, and the access time. The report will then present the following information for both the current month and the current year:

  • The total number of unique visitors accessing the site

  • The top 10 IP addresses

  • The top 5 most popular pages

  • The 5 least popular pages that have been visited

I'm not concerned much about the effects of proxies or gateways and will consider any request from the same IP address within the same day part of a user's visit.

Although numbers are great, sometimes it's helpful to see information presented graphically as well. You will use the GD functions to add graphs to the reports. The charts will show traffic breakdown for the month and year.

Figure 9-1 shows this project in action as the report displayed in a web browser.

Figure 9-1. Figure 9-1

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.217.84.171