In
this
chapter, we build on the querying techniques discussed in Chapter 4 and complete our coverage of techniques that
read data from web databases. We focus here on user-driven querying,
in which the user provides data that controls the query process. To
input parameters into the querying process, the user usually selects
or types data into an HTML <form>
environment, or clicks on links that request scripts.
We explain user-driven querying by introducing how to:
Pass data from a web browser to a web server.
Access user data in scripts.
Secure interactive query systems.
Query databases with user data.
Produce one script that contains an HTML
<form>
and the code that outputs the query
results. We call this a combined script.
Develop results pages with previous page and next page links.
Use five-step querying to produce components for user input.
Our case-study example in this chapter is the wine browsing component of the winestore. Similar to most user-driven modules, the wine browsing component has two subcomponents: first, the search bar allows the user to enter a type of wine as a criteria for a database query; and, second, the results pages show the user the wines that match the criteria entered in the search bar. The search bar is shown in Figure 5-1 at the base of the winestore search page, and the results of running the query are presented above it in a results page. The results pages allow the user to view the wines in pages of 12 wines each, move between results pages, and add wines to his shopping cart.
The querying just described is a two-component user-driven querying process. A less common type of user-driven querying describes a query that doesn’t produce output, but instead returns the user directly to the query input component. This one-component querying process is often used to add items to a shopping cart. We also explain one-component querying in this chapter.
Extended examples of user-driven querying can be found in Chapter 10 to Chapter 13.
Three techniques can be used to pass data that drives the querying process in a web database application:
Manual entry of a URL to retrieve a PHP script resource and provide parameters to the resource. For example, a user may open a URL using the Open Page option in the File menu of the Netscape web browser.
Data entry through HTML <form>
environments.
For example, <form>
environments can capture
textual input, and input is made by selecting radio buttons,
selecting one or more items from a drop-down select list, clicking on
buttons, and through other data entry widgets.
Embedded hypertext links that can be clicked to retrieve a PHP script resource and provide parameters to the script.
Using an HTML <form>
and clicking on
hypertext links are the most common techniques for providing user
input for querying in web database applications.
In practice, user data or
parameters
are passed from a web browser to a web server using HTTP; Chapter 1 contains an introduction to HTTP and more
details can be found in Appendix B. Using HTTP,
data is passed with one of two methods, GET
or
POST
. In the GET
method, data
is passed as part of the requested URL; the GET
method gets a resource with the parameters modifying how the resource
is retrieved. In the POST
method, the data is
encoded separately from the URL and forms part of the body of the
HTTP request; the POST
method is used when data is
to be posted or stored on the server. The HTML
<form>
environment can specify either the
GET
or POST
method, while an
embedded link or a manually entered URL with parameters always uses
the GET
method.
In this section, we discuss how to:
Pass parameters from a web browser to a PHP script. You will see how
HTTP requests can include user data by creating URLs, developing HTML
<form>
environments, and embedding links in
HTML documents.
Process user data to ensure it is a minimal security threat to the web server or the DBMS.
Section 5.2 introduces techniques to execute queries that include user input and to present the results.
The first technique that passes data from a web browser to a web server is manual entry of a URL in a web browser.
Consider an example user request with a parameter. In this example, the user types the following URL directly into the Location box in the Location toolbar of a Netscape browser:
http://localhost/example.5-1.php?regionName=Riverland
The URL specifies that the resource to be retrieved is
example.5-1.php
with a query string parameter of
regionName=Riverland
appended to the resource
name. The user then presses the Enter key to issue an HTTP request
for the resource and to use the GET
method that
passes the parameter to the resource. The query string parameter
consists of two parts: a parameter name regionName
and a value for that parameter of Riverland
.
The script resource example.5-1.php
is shown in
Example 5-1. Before the script is processed by the
PHP scripting engine, variables associated with any parameters to the
resource are initialized and assigned values. In this example, a
variable $regionName
, which has the same name as
the URL parameter name, is automatically initialized by the PHP
engine and assigned the value Riverland
that was
passed in the URL. This variable and its value are then accessible
from within the script, making the data passed by the user available
in the middle tier.
Example 5-1. Printing the value of a parameter passed to the script with an HTTP request
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html> <head> <title>Parameter</title> </head> <body> <?php include 'db.inc'; echo "RegionName is " . $regionName . " "; ?> </body> </html>
As a result of running the script, the following HTML document is created with the value of the query string parameter printed as part of the output:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html> <head> <title>Parameter</title> </head> <body> RegionName is Riverland </body> </html>
In practice, as discussed later in Section 5.2, this data might be used as part of a clause in an SQL query.
Automatic variable initialization from parameters is one of the best features of PHP. PHP automatically initializes each variable that has the same name as a parameter in an HTTP request, and the parameter values are automatically assigned to the variables. No additional programming is required to access query string parameters.
More than one parameter can be passed with an HTTP
GET
request by separating each parameter with an
ampersand character. For example, to pass two parameters
regionName
and Type
with the
values Yarra
and Red
,
respectively, the following URL can be created:
http://localhost/test.php?regionName=Yarra&Type=Red
The values of these parameters can then be printed in the script
test.php
using the fragment:
echo $regionName; echo $Type;
The second technique
that captures data passed from a browser to a server is the HTML
<form>
environment.
Manually entering data as part of a URL is unusual. Instead, users
typically enter data into an HTML <form>
that is then encoded by the browser as part of an HTTP request. Example 5-2 is an HTML document that contains a
<form>
in which to enter the name of a wine
region. The page, rendered with a Netscape browser, is shown in Figure 5-2.
Example 5-2. An HTML <form> for entry of a regionName
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html> <head> <title>Explore Wines in a Region</title> </head> <body bgcolor="white"> <form action="example.5-1.php" method="GET"> <br>Enter a region to browse : <input type="text" name="regionName" value="All"> (type All to see all regions) <br> <input type="submit" value="Show wines"> </form> <br><a href="EB0596000413_2.html">Home</a> </body> </html>
When the user presses the button labeled Show Wines, the data entered
in the <form>
is encoded in an HTTP request
for the resource example.5-1.php
. The resource
to be requested is specified in the action
attribute of the <form>
tag, as is the
method
used for the HTTP request:
<form action="example.5-1.php" method="GET">
In this <form>
, there is only one
<input>
widget with the attribute
type="text"
and
name="regionName"
. When the GET
method is used, the name of this attribute and its value result are
appended to the URL as query string parameters. If the user types
Yarra
Valley
into the text
widget and then clicks on Show Wines, the following URL is requested:
http://localhost/example.5-1.php?regionName=Yarra+Valley
Submitting the <form>
has the same result as
manually typing in the URL but the user need not understand URLs and
HTTP requests when using <form>
.
After submitting the <form>
, the script in
Example 5-1 outputs as a response an HTML document
containing the phrase “regionName is Yarra
Valley”. Note that the space character entered by
the user in the <form>
is automatically
encoded in the URL as a plus character by the web browser, then
decoded back to a space character by the PHP scripting engine.
The HTTP POST
method can be used in a
<form>
instead of the GET
method by changing the method="GET"
attribute of
the <form>
tag to
method="POST"
; the merits of
POST
versus GET
are discussed
in more detail in Appendix B. This change of method
has no effect on automatic variable initialization in PHP scripts,
and the PHP script engine initializes variables from the parameters
passed in the POST
request in the same way it does
for GET
requests. The script in Example 5-1 can be used without modification to process a
regionName
attribute that is passed with a
POST
request.
All <form>
fields—whether passed using
the GET
or POST
methods—are automatically translated into PHP variables for
direct use in scripts.
This is one of the best features of PHP, making it far simpler to write web-enabled scripts in PHP than in other languages. However, it introduces a minor security risk discussed later.
The third technique that passes data from a web browser to a web server is embedding links in an HTML document. This technique runs queries in most web database applications and is conceptually similar to manually entering a URL. We show how to create embedded links using the results of database queries in Section 5.2.
Embedded links in an HTML document can be authored in the same way a manually created URL is typed into a web browser. Consider the script shown in Example 5-3 that is rendered in a Netscape browser in Figure 5-3.
Example 5-3. HTML document containing three links that pass two different parameters
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/html401/loose.dtd"> <html> <head> <title>Explore Wines</title> </head> <body bgcolor="#ffffff"> Explore all our <a href="example.5-4.php?regionName=All&wineType=All"> wines</a> <br>Explore our <a href="example.5-4.php?regionName=All&wineType=Red"> red wines</a> <br>Explore our <a href="example.5-4.php?regionName=Riverland&wineType=Red"> premium reds from the Riverland</a> <br><a href="EB0596000413_2.html">Home</a></body> </html>
The script contains three links that can request the resource
example.5-4.php
and pass different parameters to
the resource. For example, the first link in the HTML document is:
Explore all our <a href="example.5-4.php? regionName=All&wineType=All"> wines</a>
Clicking on this link creates an HTTP request for the URL:
http://localhost/example.5-4.php? regionName=All&wineType=All
The result of the request is that the script in Example 5-4 is run, and the following HTML document is created:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html> <head> <title>Parameters</title> </head> <body> regionName is All <br>wineType is All </body> </html>
Example 5-4. A simple script to print out HTTP attributes and values
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html> <head> <title>Parameters</title> </head> <body> <?php include 'db.inc'; $regionName = clean($regionName, 30); $wineType = clean($wineType, 10); echo "regionName is " . $regionName . " "; echo "<br>wineType is " . $wineType . " "; ?> </body> </html>
Note that the ampersand characters in the URLs in the HTML document
are replaced with &
, because the ampersand
character has a special meaning in HTML and should not be included
directly in a document. When the link is clicked, the encoded
&
is translated by the browser to
&
in forming the HTTP request.
This section introduces simple techniques that preprocess user data to solve many common security holes in web database applications.
Using the techniques described here doesn’t completely secure a system. Remember that securing a web database application is important, and that the advice offered here isn’t a complete solution. A discussion of other security issues is presented in Chapter 9.
Data that is passed from a web browser to a web server should be
secured using the steps described here. For this purpose, we have
authored the clean( )
function to ensure that the data passed to a
script is of the correct length and that special characters
aren’t misused to attack the system. To understand
why the clean( )
function is needed, we describe
an example attack later in this section. The function is part the
include file db.inc
that is used in all scripts in the online winestore.
Consider the following script. It uses the PHP exec( )
library
function to run a program on the web server. The exec( )
function takes two parameters, the program to run and an
array populated with any output of the program. In this example, the
script uses exec( )
to run the Unix
cal
program and to pass the user-entered
parameter $userString
to the program. The
information in the parameter userString
can be
provided by using an HTML <form>
with a text
input widget, by manually creating a URL, or by embedding a link in
an HTML document.
<?php // DO NOT INSTALL THIS SCRIPT ON A WEB SERVER // Run "cal" with the parameter $userString // Store the results in the array $result exec("/usr/bin/cal $userString", $result); ?> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html> <head> <title>Calendar</title> </head> <body> <pre> <?php // Print out each line of the calendar foreach($result as $element) echo "$element "; ?> </pre> </body> </html>
Never use exec( )
or other commands to run
programs from a web script or to query a database without
securing the user data. Do not install the
calendar example on a web server.
The Unix cal
program is a useful utility that
produces monthly or yearly calendars for any date. For example, to
produce a calendar for the whole of 2003, a user could request the
URL:
http://localhost/cal.php?userString=2003
This runs the command /usr/bin/cal 2003
and
outputs the complete 2003 calendar, as shown in Figure 5-4.
To produce a calendar for February 2003, the user requests:
http://localhost/cal.php?userString=2+2003
Requesting the URL without any parameters produces the calendar for the current month:
http://localhost/cal.php
While this script might seem useful and innocuous, this script is a major security hole and should never be installed on a web server.
To illustrate why the script should never be installed, consider how it can be misused. If a user wants to enter two or more commands on a single line in a Unix shell, he can do so by separating the commands with a semicolon character. For example, to see who is logged in and then to list the files in the current directory, a user can type the following commands at the shell:
% who ; ls
Now, consider what happens if he exploits this feature by requesting the following URL:
http://localhost/cal.php?userString=2001;cat+/etc/passwd
The script produces a 2001 calendar, followed by the system password file, as shown in Figure 5-5! The script allows a creative user to do things the web server process can do. The identity of the owner of the web server process affects the severity of the actions that can be performed, but this is at best a major security hole.
Semicolons, colons, greater-than and less-than signs, and other
special characters can cause a script or a query to provide
undesirable functions, especially if the script executes the library
functions system( )
or exec( )
to run server
commands. Even if a <form>
makes it
difficult for a user to enter undesirable data, he can manually
create his own request by entering a URL and authoring a query
string.
Never trust anything you don’t have control of, that is, anything not in middle or database tiers.
SQL querying also has problems. For example, a user can guess the
structure of database tables and how a query is formed from user
input. A user might guess that a query uses an AND
clause and that a particular <form>
text
widget provides one of the values to the query. The user might then
add additional AND
and OR
clauses to the query by entering a partial SQL query in the text
widget. While such tricks may expose data that should remain hidden
from the user, problems compound if the user inserts or deletes data
with the techniques discussed in Chapter 6.
However, many problems can be solved with careful server-side
validation, as discussed in Chapter 7 and the
approach described next.
To improve security and prevent special-character attacks, user data
should be processed with the clean( )
function:
function clean($input, $maxlength) { $input = substr($input, 0, $maxlength); $input = EscapeShellCmd($input); return ($input); }
The first line uses the substr( )
function to reduce the variable $input
to a
maximum length of $maxlength
by taking a substring
beginning at the first character. You can use 30 as a maximum
$regionName
length for Example 5-1, and the calendar example might use a maximum
length of 7. The second line calls the library function
EscapeShellCmd( )
, which escapes any special-purpose
characters—such as semicolons, colons, greater-than and
less-than signs, and so on—by replacing the character with a
single backslash and then the character.
For many purposes, the clean
steps are sufficient
to ensure data is safe. As an example, if the parameter
userString
has a value of:
2001;cat /etc/passwd
then a call of:
clean($userString, 7)
produces the harmless string 2001;ca
.
This string has no detrimental effect and provides the user with no
hidden data. clean( )
is used to preprocess all user data for
the winestore.
User data that has not been preprocessed or cleaned is often known as tainted data, a term originating from the Perl scripting language. Rectifying this through the processing we have described untaints user data.
As we have discussed throughout this section, variables are automatically initialized by PHP from the parameters passed in an HTTP request.
Automatic initialization of variables is an excellent feature for
simple scripts, but it has security and processing implications. If
required, the automatic initialization can be turned off by setting
register_globals=false
in the
php.ini
configuration file, usually found in the
directory /usr/local/lib/
. The
php.ini
file was copied to this location as part
of the PHP installation instructions in Appendix A.
When the PHP script engine is invoked, the engine declares and initializes variables in a predefined order. The automatic initialization feature works in this order:
By default, environment variables are initialized first.
Variables are initialized from query string parameters passed with
the GET
method.
POST
method parameters are initialized.
Variables from cookies are initialized.
The Apache server internal variables are initialized.
The initialization order can be changed from the default by adjusting
the variables_order
setting in
php.ini
. The security problem occurs when a user
knowingly or inadvertently overrides a previously initialized
variable. For example, the PATH
environment
variable is one of the first initialized when the script engine is
invoked. If a GET
request contains an attribute
named PATH
, this overrides the environment
variable of the same name, because GET
variables
are initialized after environment variables. By understanding the
initialization process, the user can override previously set
variables by passing through parameters. This can change script
behavior and possibly lead to a security problem.
If the register_globals
feature is turned off, a
PHP script must use a different method to access user data. This
method is more secure and requires that arrays be accessed to
retrieve specific user parameters. For example, the
GET
variables are stored in an associative array,
$HTTP_GET_VARS
.
Consider the following URL that is requested by a user:
http://localhost/test.php?varname=value
The variable $varname
can be printed in a PHP
script by accessing the associative array
$HTTP_GET_VARS
using:
echo $HTTP_GET_VARS["varname"];
The only disadvantage of this approach is that the script is tailored
for the GET
method. Changing the
<form>
submission method from
GET
to POST
requires modifying
the script. All references to $HTTP_GET_VARS
must
be replaced with references to $HTTP_POST_VARS
,
because the array $HTTP_POST_VARS
stores all
variables passed using the POST
method. However,
the use of associative arrays is more secure because the script
doesn’t function if the user maliciously changes the
<form>
submission method in an attempt to
compromise the system.
We often initialize local variables at the beginning of a script from
the contents of the $HTTP_GET_VARS
or
$HTTP_POST_VARS
arrays. This emulates the
register_globals
feature of PHP but without the
security issues. Local variables make the code more attractive and
readable. For example, the following code fragment initializes three
variables from the contents of the $HTTP_GET_VARS
array:
$surname = $HTTP_GET_VARS["surname"]; $firstname = $HTTP_GET_VARS["firstname"]; $title = $HTTP_GET_VARS["title"];
The result is that the script behaves the same as if
register_globals
is on. This also has the
advantage that if the <form>
submission
method is changed from GET
to
POST
, the code need be modified only in one place.
Other external variables can be accessed similarly:
POST
variables can found in the array
$HTTP_POST_VARS
.
Cookie variables can be found in the array
$HTTP_COOKIE_VARS
.
Environment variables can be found in the array
$HTTP_ENV_VARS
.
Session
variables can be found in the array
$HTTP_SESSION_VARS
.
Server
variables can be found in the array
$HTTP_SERVER_VARS
.
Cookies and sessions are discussed in Chapter 8.
We have set register_globals=true
in the online
winestore application. However, the security implications of
automatic initialization should be considered when designing any
application.
18.191.67.40