Pass data from a form to a PHP script
Validate form data
Work with multivalued form components
Before jumping into any examples, let’s begin with an introduction to how PHP is able to accept and process data submitted through a web form.
PHP and Web Forms
What makes the Web so interesting and useful is its ability to disseminate information as well as collect it, the latter of which is accomplished primarily through an HTML-based form. These forms are used to encourage site feedback, facilitate forum conversations, collect mailing and billing addresses for online orders, and much more. But coding the HTML form is only part of what’s required to effectively accept user input; a server-side component must be ready to process the input. Using PHP for this purpose is the subject of this section.
Because you’ve used forms hundreds if not thousands of times, this chapter won’t introduce form syntax. If you require a primer or a refresher course on how to create basic forms, consider reviewing any of the many tutorials available on the Web.
Instead, this chapter reviews how you can use web forms in conjunction with PHP to gather and process user data.
The first thing to think about when sending data to and from a web server is security. The HTTP protocol used by browsers is a plain text protocol. This makes it possible for any system between the server and the browser to read along and possible modify the content. Especially if you are creating a form to gather credit card information or other sensitive data, you should use a more secure way of communication to prevent this. It is relatively easy to add an SSL certificate to the server, and it can be done at no cost by using services like LetsEncrypy ( https://letsencrypt.com ). When the server has an SSL certificate installed, the communication will be done via HTTPS where the server will send a public key to the browser. This key is used to encrypt any data from the browser and decrypt data coming from the server. The server will use the matching private key to encrypt and decrypt.
Keep in mind that other than the odd syntax, $_POST variables are just like any other variable that can be accessed and modified by the PHP script. They’re simply referenced in this fashion in an effort to definitively compartmentalize an external variable’s origination. As you learned in Chapter 3, such a convention is available for variables originating from the GET method, cookies, sessions, the server, and uploaded files.
Let’s take a look at a simple example demonstrating PHP’s ability to accept and process form data.
A Simple Example
In this example, the form refers to the script in which it is found, rather than another script. Although both practices are regularly employed, it’s quite commonplace to refer to the originating document and use conditional logic to determine which actions should be performed. In this case, the conditional logic dictates that the echo statements will only occur if the user has submitted (posted) the form.
HTML used to be limited to a few basic input types, but with the introduction of HTML5 a few years back, this was changed as support for color, date, datetime-local, email, month, number, range, search, tel, time, url, and week was added. These are all options that can be used with the type attribute on the input tag. They will use specific browser logic that allows for localization and validation.
Just because browsers now support some input validation does not mean you can skip that part in the PHP script that is used to receive the input. There is no guarantee that the client is a browser. It is best practice to never trust the input that is coming into a PHP script.
Validating Form Data
In a perfect world, the preceding example would be perfectly sufficient for accepting and processing form data. The reality is that websites are under constant attack by malicious third parties from around the globe, poking and prodding the external interfaces for ways to gain access to, steal, or even destroy the website and its accompanying data. As a result, you need to take great care to thoroughly validate all user input to ensure not only that it’s provided in the desired format (for instance, if you expect the user to provide an e-mail address, then the address should be syntactically valid), but also that it is incapable of doing any harm to the website or underlying operating system.
This section shows you just how significant this danger is by demonstrating two common attacks experienced by websites whose developers have chosen to ignore this necessary safeguard. The first attack results in the deletion of valuable site files, and the second attack results in the hijacking of a random user’s identity through an attack technique known as cross-site scripting . This section concludes with an introduction to a few easy data validation solutions that will help remedy this situation.
File Deletion
The inventory_manager application would indeed execute as intended but would be immediately followed by an attempt to recursively delete every file residing in the directory where the executing PHP script resides.
Cross-Site Scripting
The previous scenario demonstrates just how easily valuable site files could be deleted should user data not be filtered; however, it’s possible that damage from such an attack could be minimized by restoring a recent backup of the site and corresponding data, but it would be much better to prevent it from happening in the first place.
There’s another type of attack that is considerably more difficult to recover from—because it involves the betrayal of users who have placed trust in the security of your website. Known as cross-site scripting, this attack involves the insertion of malicious code into a page frequented by other users (e.g., an online bulletin board). Merely visiting this page can result in the transmission of data to a third-party’s site, which could allow the attacker to later return and impersonate the unwitting visitor. To demonstrate the severity of this situation, let’s configure an environment that welcomes such an attack.
Suppose that an online clothing retailer offers registered customers the opportunity to discuss the latest fashion trends in an electronic forum. In the company’s haste to bring the custom-built forum online, it decided to skip sanitization of user input, figuring it could take care of such matters at a later point in time. Because HTTP is a stateless protocol, it’s common to store values in the browser memory (Cookies) and use that data when the user interacts with the site. It is also common to store most of the data on the server site and only store a key as a cookie in the browser. This is commonly referred to as a session id. If it’s possible to gain access to the session id for different users, it will be possible for an attacker to impersonate the other users.
If the e-commerce site isn’t comparing cookie information to a specific IP address (a safeguard that would likely be uncommon on a site that has decided to ignore data sanitization), all the attacker has to do is assemble the cookie data into a format supported by the browser, and then return to the site from which the information was culled. Chances are the attacker is now masquerading as the innocent user, potentially making unauthorized purchases, defacing the forums, and wreaking other havoc.
Modern browsers support both in-memory and http-only cookies. That makes it more difficult for an attacker to get access to the cookie values from injected JavaScript. Setting the session cookie to http-only is done by adding session.cookie_httponly = 1 to the php.ini file.
Sanitizing User Input
Given the frightening effects that unchecked user input can have on a website and its users, one would think that carrying out the necessary safeguards must be a particularly complex task. After all, the problem is so prevalent within web applications of all types, so prevention must be quite difficult, right? Ironically, preventing these types of attacks is really a trivial affair, accomplished by first passing the input through one of several functions before performing any subsequent task with it. It is important to consider what you do with input provided by a user. If it passed on as part of a database query, you should ensure that the content is treated as text or numbers and not as a database command. If handed back to the user or different users, you should make sure that no JavaScript is included with the content as this could be executed by the browser.
Four standard functions are available for doing so: escapeshellarg() , escapeshellcmd() , htmlentities() , and strip_tags() . You also have access to the native Filter extension, which offers a wide variety of validation and sanitization filters. The remainder of this section is devoted to an overview of these sanitization features.
Note
Keep in mind that the safeguards described in this section (and throughout the chapter), while effective in many situations, offer only a few of the many possible solutions at your disposal. Therefore, although you should pay close attention to what’s discussed in this chapter, you should also be sure to read as many other security-minded resources as possible to obtain a comprehensive understanding of the topic.
Websites are built with two distinct components: the server side that generates output and handles input from the user and the client side that renders HTML and other content as well as JavaScript code provided by the server. This two-tier model is the root of the security challenges. Even if all the client side code is provided by the server, there is no way to ensure that it is executed or that it is not tampered with. A user might not use a browser to interact with the server. For this reason, it is recommended to never trust any input from a client, even if you spend the time to create nice validation functions in JavaScript to make a better experience for the user that follows all your rules.
Escaping Shell Arguments
Attempting to execute this would mean 50; rm -rf * would be treated by inventory_manager as the requested inventory count. Presuming inventory_manager is validating this value to ensure that it’s an integer, the call will fail and no harm will be done.
Escaping Shell Metacharacters
This function operates by escaping any shell metacharacters found in the command. These metacharacters include # & ; ` , | * ? ~ < > ^ ( ) [ ] { } $ x0A xFF.
This means exec() would attempt to execute the command /usr/bin/blah rm -rf , which of course doesn’t exist.
Converting Input into HTML Entities
& will be translated to &
" will be translated to " (when quote_style is set to ENT_NOQUOTES)
> will be translated to >
< will be translated to <
' will be translated to ' (when quote_style is set to ENT_QUOTES)
Stripping Tags from User Input
Validating and Sanitizing Data with the Filter Extension
Because data validation is such a commonplace task, the PHP development team added native validation features to the language in version 5.2. Known as the Filter extension, you can use these new features to not only validate data such as an e-mail address so it meets stringent requirements, but also to sanitize data, altering it to fit specific criteria without requiring the user to take further actions.
The Filter Extension’s Validation Capabilities
Target Data | Identifier |
---|---|
Boolean values | FILTER_VALIDATE_BOOLEAN |
E-mail addresses | FILTER_VALIDATE_EMAIL |
Floating-point numbers | FILTER_VALIDATE_FLOAT |
Integers | FILTER_VALIDATE_INT |
IP addresses | FILTER_VALIDATE_IP |
MAC Address | FILTER_VALIDATE_MAC |
Regular Expressions | FILTER_VALIDATE_REGEXP |
URLs | FILTER_VALIDATE_URL |
Consult the PHP documentation for a complete list of available flags.
Sanitizing Data with the Filter Extension
The Filter Extension’s Sanitization Capabilities
Identifier | Purpose |
---|---|
FILTER_SANITIZE_EMAIL | Removes all characters from a string except those allowable within an e-mail address as defined within RFC 822 ( https://www.w3.org/Protocols/rfc822/ ). |
FILTER_SANITIZE_ENCODED | URL encodes a string, producing output identical to that returned by the urlencode() function . |
FILTER_SANITIZE_MAGIC_QUOTES | Escapes potentially dangerous characters with a backslash using the addslashes() function . |
FILTER_SANITIZE_NUMBER_FLOAT | Removes any characters that would result in a floating-point value not recognized by PHP. |
FILTER_SANITIZE_NUMBER_INT | Removes any characters that would result in an integer value not recognized by PHP. |
FILTER_SANITIZE_SPECIAL_CHARS | HTML encodes the ’, ”, <, >, and & characters, in addition to any character having an ASCII value less than 32 (this includes characters such as a tab and backspace). |
FILTER_SANITIZE_STRING | Strips all tags such as <p> and <b>. |
FILTER_SANITIZE_STRIPPED | An alias of “string” filter. |
FILTER_SANITIZE_URL | Removes all characters from a string except for those allowable within a URL as defined within RFC 3986 ( https://tools.ietf.org/html/rfc3986 ) . |
FILTER_UNSAFE_RAW | Used in conjunction with various optional flags, FILTER_UNSAFE_RAW can strip and encode characters in a variety of ways. |
As it does with the validation features, the Filter extension also supports a variety of flags that can be used to tweak the behavior of many sanitization identifiers. Consult the PHP documentation for a complete list of supported flags.
Working with Multivalued Form Components
Summary
One of the Web’s great strengths is the ease with which it enables us to not only disseminate but also compile and aggregate user information. However, as developers, this means that we must spend an enormous amount of time building and maintaining a multitude of user interfaces, many of which are complex HTML forms. The concepts described in this chapter should enable you to decrease that time a tad.
In addition, this chapter offered a few commonplace strategies for improving your application’s general user experience. Although not an exhaustive list, perhaps the material presented in this chapter will act as a springboard for you to conduct further experimentation while decreasing the time that you invest in what is surely one of the more time-consuming aspects of web development: improving the user experience.
The next chapter shows you how to protect the sensitive areas of your website by forcing users to supply a username and password prior to entry.