1. Manipulating Strings

Of all data types PHP supports, string data is probably the one most often used. One of the reasons for this is that, at some point, a string representation of something is needed when sending out something to the client.

PHP offers a vast number of functions suitable for strings, almost 100. In addition, regular expressions come in handy when looking for certain patterns in strings. In real life, however, only a fraction of these functions are actually used. Most of them deserve their loyal fan base, but some underestimated functions should get more attention. The phrases in this chapter offer a good mix of both: standard applications and rather unusual but very useful ways to work with strings.

Comparing Strings

strcmp($a, $b)
strcasecmp($a, $b)

<?php
  $a = 'PHP';
  $b = 'php';
  echo 'strcmp(): ' . strcmp($a, $b) . '<br />';
  echo 'strcasecmp(): ' . strcasecmp($a, $b);
?>

Comparing Strings (compare.php)

Which outputs

strcmp(): -32
strcasecmp(): 0

Comparing strings seems like an easy task—use the == operator for implicit type conversion (so '1' == 1 returns true) or the === operator for type checking (so '1' === 1 returns false). However, the first method is rather flawed because the type conversions are not always turned into strings. For instance, 1 == '1twothree' returns true, too; both values are converted into integers. Therefore, === is the way to go.

However, PHP also offers functions that offer a bit more than just comparing strings and returning true or false. Instead, strcmp() returns a positive value when the string passed as the first parameter is greater than the second parameter and a negative value when it is smaller. If both strings are equal, strcmp() returns 0. If no case sensitivity is required, strcasecmp() comes into play. It works as strcmp(), but it does not distinguish between uppercase and lowercase letters.

These two functions can be used to sort arrays. You can find more information about custom array sorting in Chapter 2, “Working with Arrays.”

Checking Usernames and Passwords

<?php
  $user = (isset($_GET['user'])) ? $_GET['user'] : '';
  $pass = (isset($_GET['pass'])) ? $_GET['pass'] : '';

  if (
    (strtolower($user) === 'damon' && $pass === 'secret') ||
    (strtoupper($user) === 'SHELLEY' && $pass === 'verysecret') ||
    (strcasecmp($user, 'Christian') == 0 && strcmp($pass, 'topsecret') == 0)
  ) {
    echo 'Login successful.';
  } else {
    echo 'Login failed.';
  }
?>

Validating Logins by Comparing Strings (comparelogin.php)

When validating a username and a password (for example, in a script that backs an HTML login form), two things seem to form a de facto standard on the Web:

• The password is always case sensitive. It has to be provided exactly the same way it was set.

• The username, however, is not case sensitive.

Therefore, a username has to be compared without considering case sensitivity. This can be done either by using strcasecmp()—see the previous phrase—or by first converting both the provided password and the real password into lowercase letters (or uppercase letters). This is done by the functions strtolower() or strtoupper(). The preceding code shows an example, using strcmp()/strcasecmp() and also the compare operator ===.

Depending on the data provided in the uniform resource locator (URL) of the call to the script, the login either fails or succeeds. For instance, the following URL successfully logs in the user. (You have to change the servername portion.)

http://servername/comparelogin.php?user=cHRISTIAN&&pass=topsecret

In contrast, the following login does fail:

http://servername/comparelogin.php?user=Christian&&pass=TopSecret


Note

Of course, providing usernames and passwords via GET is a very bad idea; POST is preferred. (See Chapter 4, “Interacting with Web Forms,” for more details on form data.) However, for testing purposes, this chapter’s code uses GET.


Converting Strings into HTML

htmlspecialchars($input)
htmlentities($input)

<?php
  $input = '<script>alert("I have a bad 'Föhnwelle', ' .
           'therefore I crack websites.");</script>';

  echo htmlspecialchars($input, ENT_QUOTES) . '<br />';
  echo htmlentities($input);
?>

Escaping Strings for HTML (htmlescape.php)

A commonly used Web attack is called Cross-Site Scripting (XSS). For example, a user enters some malicious data, such as JavaScript code, into a Web form; the Web page then at some point outputs this information verbatim, without proper escaping. Standard examples for this are Web guest books or discussion forms. People enter text for others to see it.

Here, it is important to remove certain HTML markup. To make a long story short: It is almost impossible to really catch all attempts to inject JavaScript into data. It’s not only always done using the <script> tag, but also in other HTML elements, such as <img onabort=”badCode()” />. Therefore, in most cases, all HTML must be removed.

The easiest way to do so is to call htmlspecialchars(); this converts the string into HTML, including replacement of all < and > characters with &lt; and &gt;. One notable exceptions are single quotes, which are not converted by default. However, when using the ENT_QUOTES constant as a second argument, single quotes are properly escaped, as well:

htmlspecialchars($input, ENT_QUOTES)

Another option is to call htmlentities(). This uses HTML entities for characters, if available. The preceding code shows the differences between these two methods. The German ö (o umlaut) is not converted by htmlspecialchars(); however, htmlentities() replaces it with its entity &ouml;.

The use of htmlspecialchars() and htmlentities() just outputs what the user entered in the browser. So if the user enters HTML markup, this very markup is shown. Thus, htmlspecialchars() and htmlentities() please the browser but might not please the user.


Note

If you want to prepare strings to be used within URLs, you have to use urlencode() to properly encode special characters such as the space character that can be used in URLs.


However, the function strip_tags() does completely get rid of all HTML elements. If you just want to keep some elements (for example, some limited formatting functionalities with <b> and <i> and <br /> tags), you provide a list of allowed values in the second parameter for strip_tags(). The following script shows this; Figure 1.1 depicts its output. As you can see, all unwanted HTML tags have been removed; however, its contents are still there:

<?php
  $input = 'My parents <i>hate</i> me, <br />' .
    'therefore I <b>crack</b> websites. ' .
    '<script>alert("Nice try!");</script>' .
    '<img src="explicit.jpg" />';

  echo strip_tags($input, '<b><br><i>'),
?>

Image

Figure 1.1. Some HTML tags were stripped, but not all.

Using Line Breaks

<?php
  $input = "One Two Three";
  echo nl2br($input);
?>

Adding <br /> Elements at Every Line Break (nl2br.php)

How can a line break be used within HTML? That’s easy: with the <br /> HTML element. However, what if there is data with or line breaks? Search and replace comes to mind; however, it is much easier to use a built-in PHP function: nl2br(). This parses a string and converts all line breaks to <br /> elements, as the preceding script shows.

As you can see, the line breaks are still there, but the <br /> elements were added.

Encrypting Strings

$encpass = '$1$FK3.qn2.$Si5KhnprsRb.N.SEF4GMW0';

<?php
  $pass = (isset($_GET['pass'])) ? $_GET['pass'] : '';
  $encpass = '$1$FK3.qn2.$Si5KhnprsRb.N.SEF4GMW0';

  if (crypt($pass, $encpass) === $encpass) {
    echo 'Login successful.';
  } else {
    echo 'Login failed.';
  }
?>

Checking Logins Using an Encrypted Password (crypt.php)

Passwords should never be stored verbatim in a database, but should instead be stored in an encrypted way. Some databases internally offer encryption; for all the others, PHP is there to help. The crypt() function encrypts a string using Data Encryption Standard (DES). This is a one-way encryption, so there is no way back. Also, subsequent calls to crypt() result in different results.

For instance, the string 'TopSecret' is encrypted into $1$FK3.qn2.$Si5KhnprsRb.N.SEF4GMW0 (and also $1$m61.1i2.$OplJ3EHwkIxycnyePplFz0 and $1$9S3.c/3.$51O1Bm4v3cnBNOb1AECil., but this example sticks with the first one). Checking whether a value corresponds to a result from calling crypt() can be done by calling crypt() again: crypt($value, $encryptedValue) must return $encryptedValue.

The preceding script checks whether a password provided via the URL matches the previous result of crypt(). Calling this script with the GET parameter pass=TopSecret succeeds in logging in; all other passwords fail.


Note

To provide more details: The second parameter to crypt() is the salt (initialization value) for encrypting the data. You can also use a salt when encrypting the original password. However, you do have to make sure that the salt values are unique; otherwise, the encryption is not secure. Therefore, do not use a custom salt value and let PHP do the work.

Be also advised, though, that DES encryption can be cracked in about 24 hours, so it’s not bulletproof anymore. A more recent alternative is Advanced Encryption Standard (AES).


Checksumming Strings

md5()
sha1()

<?php
  $pass = (isset($_GET['pass'])) ? $_GET['pass'] : '';

  $md5pass = '6958b43cb096e036f872d65d6a4dc01b';
  $sha1pass = '61c2feed11e0e53eb8e295ab8da78150be12f301';

  if (sha1($pass) === $sha1pass) {
    echo 'Login successful.';
  } else {
    echo 'Login failed.';
  }

// Alternatively, using MD5:
//  if (md5($pass) === $md5pass) {
//    echo 'Login successful.';
//  } else {
//    echo 'Login failed.';
//  }
?>

Checking Logins Using SHA1 and MD5 Hashes (checksum.php)

PHP offers these two main functions for creating checksums:

md5() calculates the MD5 hash of a string.

sha1() calculates the SHA1 hash of a string.

Using crypt() with strings is similar to creating a checksum of something: It can be easily determined whether a string matches the checksum; however, it is not (easily) possible to re-create the original string from the checksum.

Two algorithms whose purpose is to do exactly this checksumming are Secure Hash Algorithm 1 (SHA1) and Message Digest Algorithm 5 (MD5). They create such a checksum, or hash. The main difference between these two algorithms and the one used in DES/crypt() is that the SHA1 or MD5 checksum of a string is always the same, so it is very easy to verify data. As Figure 1.2 shows, even the PHP distributions have a MD5 checksum mentioned on the Web site to validate the downloads.

Image

Figure 1.2. The PHP downloads page shows MD5 hashes of the PHP distributions.

Again, the goal is to validate a password the user provides using GET (which, as mentioned previously, is bad practice and only used for the sake of demonstration). The correct password is, once again, 'TopSecret' with the following hashes:

6958b43cb096e036f872d65d6a4dc01b is the MD5 hash.

61c2feed11e0e53eb8e295ab8da78150be12f301 is the SHA1 hash.

From a security perspective, you should not rely on MD5 (because it can be broken with modern computers). SHA1 is not completely secure anymore, either, given the advances in hardware performance. You should consider additional safeguards:

• Before hashing a value, salt it—that is, prepend a known string, making it harder for attackers to crack it. That does not help against an attacker who uses brute force and tries almost all possible input combinations.

• Use crypt() with a more secure algorithm such as DES.


Tip

When calculating the MD5 or SHA1 hash of a file, no call to file_get_contents() or other file functions is required; PHP offers two functions that calculate the hashes of a file (and takes care of opening and reading in the file data):

md5_file()

sha1_file()


Extracting Substrings

substr()

<?php
  $php = "PHP: Hypertext Preprocessor";
  echo substr($php, 15, 4); //returns "Prep"
?>

Extracting a Substring Using substr() (substr.php; excerpt)

The substr() function returns a part of a string. You provide a string and the position of the first character to be extracted. (Keep in mind that the first character has the index 0). From this character on, the rest of the string is returned. If you only want to return a part of it, provide the length in the third parameter. The preceding code shows substr() in action and extracts Prep from PHP: Hypertext Preprocessor.


Tip

If you want to count from the end of the string, use a negative value as the second parameter of substr():

echo substr($php, -12, 4);

If you provide a negative value for the third parameter of substr() (for example, –n,) the last n characters are not part of the substring.

echo substr($php, -12, -8);

All of these calls to substr() return Prep and are included in the complete code.


Protecting Email Addresses Using ASCII Codes

protectMail('[email protected]')

<?php
  function protectMail($s) {
    $result = '';
    $s = 'mailto:' . $s;
    for ($i = 0; $i < strlen($s); $i++) {
      $result .= '&#' . ord(substr($s, $i, 1)) . ';';
    }
    return $result;
  }

  echo '<a href="' .
    protectMail('[email protected]') .
    '">Send mail</a>';
?>

Protecting Email Addresses (protectMail.php)

In the browser, you just see an email link, but the underlying HTML markup is indecipherable:

<a
href="&#109;&#97;&#105;&#108;&#116;&#111;&#58;
&#101;&#109;&#97;&#105;&#108;&#64;&#97;&#100;
&#100;&#114;&#101;&#115;&#115;&#46;&#120;&#121;">Send
mail</a>

However, take a look at Figure 1.3: The Web browser email address correctly decoded the email address, as shown in the status bar.

Image

Figure 1.3. Machine beats man (when deciphering the email address).

Some special characters are difficult to use in strings because they are hard to enter using a keyboard. However, they all have an ASCII value. PHP offers two functions to deal with this:

chr() converts the ASCII code into the corresponding character.

ord() returns the ASCII code for a character.

This can be used to protect email addresses, for instance. Because spammers are writing software to search for certain patterns (email adresses) on Web pages, this might help keep spam low. The trick is to use HTML character codes for email addresses, making it much harder for spambots to find email data.

The preceding code takes an email address (in the format [email protected]) as a parameter and returns mailto:[email protected]—but converted into HTML entities. For instance, the m of mailto: has the ASCII code 109; therefore, $#109; stands for m. To do so, a for loop iterates through all characters in the string. In addition, the length of the string has to be determined, which can be done using strlen(). Then, a call to ord() calculates the ASCII code of each character, which is then used for the resulting HTML.

Of course, this does not offer bulletproof protection; you might consider using alternative ways to obscure the email address, including a syntax such as email at address dot xy.

Scanning Formatted Strings

sscanf($date, '%d/%d/%d')

<?php
  $date = '02/01/06';
  $values = sscanf($date, '%d/%d/%d'),
  vprintf('Month: %d; Day: %d; Year: %d.', $values);
?>

Scanning Formatted Strings (sscanf.php)

Another function related to printf() is sscanf(). This one parses a string and tries to match it with a pattern that contains placeholders. The $input string contains a date and is scanned using the string '%d-%d-%d' with several placeholders, as shown in the preceding phrase. The function returns an array with all values for the matched placeholders. Then this array is passed to vprintf() to print it.

Alternatively, you can provide a list of variable names as additional parameters to sscanf(). Then the function writes the substrings that match the placeholders into these variables. The following code shows this:

<?php
  $date = '02/01/06';
  $values = sscanf($date, '%d/%d/%d', $m, $d, $y);
  echo "Month: $m; Day: $d; Year: $y.";
?>

Scanning Formatted Strings (sscanf-alternative.php)

Getting Detailed Information about Variables

var_dump(false);

The values of variables can be sent to the client using print() or echo(); however, this is sometimes problematic. Take Booleans, for instance. echo(true) prints 1, but echo(false) prints nothing. A much better way is to use var_dump(), a function that also prints the type of the variable. Therefore, this code returns the string bool(false).

This also works for objects and arrays, making var_dump() a must-have option for developers who like to debug without a debugger.


Note

A function related to var_dump() is var_export(). It works similarly; however, there are two differences:

• The return value of var_export() is PHP code; for instance, var_export(false) returns false.

• If the second parameter provided to var_export() is the Boolean true, the function does not print anything, but returns a string.


Searching in Strings

strops()

if (!strpos($string, $substring)) {
  echo 'No match found.'
}

When looking for substrings in strings, strpos() is used (and its counterpart strrpos(), which searches from the end of the string). The tricky thing about this function is that it returns the index of the first occurrence of the substring, or false otherwise. That means that the preceding code snippet is incorrect.

The preceding code snippet is incorrect because if $string happens to start with $substring, strpos() returns 0, which evaluates to false. Therefore, a comparison using === or !== must be used to take the data type into account. The following code shows how to correctly use strpos():

if (strpos($string, $substring) === false) {
  echo 'No match found.';
} else {
  echo 'Match found.';
}

Using Perl-Compatible Regular Expressions

preg_match()

<?php
  $string = 'This site runs on PHP ' . phpversion();
  preg_match('/php ((d).d.d+)/i',
    $string, $matches);
  vprintf('Match: %s<br /> Version: %s; Major: %d.',
    $matches);
?>

Searching in Strings Using PCRE (preg_match.php)

Matching patterns in PCRE is done using preg_match() if only one occurrence is searched for, or preg_match_all() if multiple occurrences may exist. The syntax is as follows: first the pattern, then the string, and then the resulting array. However, for the pattern you need delimiters; most of the time slashes are used. After the delimiter, you can provide further instructions. Instruction g lets the search be done globally (for multiple matches), whereas instruction i deactivates case sensitivity.

The function preg_match_all() works exactly the same; however, the resulting array is a multidimensional one. Each entry in this array is an array of matches as it would have been returned by preg_match(). The following code shows this:

<?php
  $string = 'This site runs on PHP ' . phpversion();
  preg_match_all('/php ((d).d.d+)/i',
    $string, $matches);
  vprintf('Match: %s<br /> Version: %s; Major: %d.',
    $matches);
?>

Finding Multiple Matches in Strings Using PCRE (preg_match_all.php)

Finding Tags with Regular Expressions

preg_match_all('/<.*?>/', $string, $matches);

<?php
  $string = '<p>Sex, drugs and <b>PHP</b>.</p>';
  preg_match_all('/<.*?>/', $string, $matches);
  foreach ($matches[0] as $match) {
    echo htmlspecialchars("$match ");
  }
?>

Finding All Tags Using Nongreedy PCRE (non-greedy.php)

Which outputs:

<p> <b> </b> </p>

One advantage of PCRE or POSIX is that some special constructs are supported. For instance, usually regular expressions are matched greedily. Take, for instance, this regular expression:

<.*>

When trying to match this in the following string

<p>Sex, drugs and <b>PHP</b>.</p>

what do you get? You get the complete string. Of course, the pattern also matches on <p>, but regular expressions try to match as much as possible. Therefore, you usually have to do a clumsy workaround, such as <[^>]*>. However, it can be done more easily. You can use the ? modifier after the * quantifier to activate nongreedy matching.

Validating Mandatory Input

function checkNotEmpty($s) {
  return (trim($s) !== ''),
}

When validating form fields (see Chapter 4 for more about HTML forms), several checks can be done. However, you should test as little as possible. For instance, when recently trying to order concert tickets for a U.S. concert, I failed to complete the order because the form expected a U.S. telephone number, which I could not provide.

The best check is to check whether there is any input at all. However, what is considered to be any input? If someone enters just whitespace (that is, space characters and other nontext characters), is the form field filled out correctly?

The best way is to use trim() before checking whether there is anything inside the variable or expression. The function trim() removes all kinds of whitespace characters, including the space character, horizontal and vertical tabs, carriage returns, and line feeds. If, after that, the string is not equal to an empty string, the (mandatory) field has been filled out.


Note

The file check.php contains sample calls and all following calls to validation functions in the file check.inc.php.


It is to be noted, however, that the numeric functions—is_float(), is_int(), and is_numeric()—also try to convert the data from their original type to the numeric type.

Another approach to convert data types is something borrowed from Java and other strongly typed C-style languages. Prefix the variable or expression with the desired data type in parentheses:

$numericVar = (int)$originalVar;

In this case, however, PHP really tries to convert at any cost. Therefore, (int)'3DoorsDown' returns 3, whereas is_numeric('3DoorsDown') returns false. In contrast, (int)'ThreeDoorsDown' returns 0.

Generally, is_numeric() (and is_int()/is_float()) seems to be the better alternative, whereas (int) returns an integer value even for illegal input. So, it’s really a matter of the specific application at hand which method to choose.

The following code offers the best of both worlds. A given input is checked whether it is numeric with is_numeric(), and if so, it is converted into an integer using (int). Adaptions to support other (numeric) data types are trivial:

function getIntValue($s) {
  if (!is_numeric($s)) {
    return false;
  } else {
    return (int)$s;
  }
}

Generating Integer Values (check.inc.php; excerpt)

Validating Email Addresses

function checkEmail($s) {
  $lastDot = strrpos($s, '.'),
  $ampersat = strrpos($s, '@'),
  $length = strlen($s);
  return !(
    $lastDot === false ||
    $ampersat === false ||
    $length === false ||
    $lastDot - $ampersat < 3 ||
    $length - $lastDot < 3
  );
}

Validating Email Addresses (check.inc.php; excerpt)

Checking whether a string contains a valid email address is two things at once: very common and very complicated. The aforementioned book on regular expressions uses several pages to create a set of regular expressions to perform this task. If you are interested in this, take a look at http://examples.oreilly.com/regex/readme.html.

Validating email addresses is difficult because the rules for valid domain names differ drastically between countries. For instance, bn.com is valid, whereas bn.de is not (but db.de is). Also, did you know that username@[127.0.0.1] is a valid email address (if 127.0.0.1 is the IP address of the mail server)?

Therefore, the recommendation is to only check for the major elements of an email address: valid characters (an @ character) and a dot somewhere after that. It is impossible to be 100 percent sure with checking email addresses; if the test is too strict, the user just provides a fantasy email address. The only purpose of email checking is to provide assistance when an (unintentional) typo occurs.

Of course, this is also possible using regular expressions, but this is probably just slower. You should also be aware that the aforementioned code cannot detect every email address that is incorrect. Also watch out for the new international domains with special characters such as ä or é in it. Most regular expressions omit these, so you are much better off with the preceding code.

Search and Replace

preg_replace()

<?php
  $string = '02/01/13';
  echo preg_replace(
    '#(d{1,2})/(d{1,2})/(d{1,2})#',
    '$2/$1/$3',
    $string
  );
?>

Replacing Matches Using PCRE (preg_replace.php)

Searching within strings is one thing; replacing all occurrences with something else is completely different. This is relatively easy, though, when using regular expressions; you just have to know the function name: preg_replace()

Within the regular expression for the replace term, you can use references to subpatterns. The complete match is referred to by $0. Then count parentheses from inside to outside, from left to right: The contents of the first parentheses are referenced by $1, the second parentheses are accessed using $2, and so on.

With this in mind, the replacement can be done. In the example, a U.S. date (month/day/year) is converted to a U.K. date (day/month/year).


Tip

The regular expression in the preceding code does not use the / delimiter for the regular expression because the regular expression itself contains slashes that would then need escaping. However, by choosing another delimiter (for example, #), you can avoid the escaping of slashes.


If you have a static pattern, without any quantifiers or special characters, using str_replace() is almost always faster. The parameter order for this is: first the strings to search for; then the replacements; and, finally, the string where the search and replace will take place. You can provide strings or arrays of strings in the first two parameters. The following code removes all punctuation from the text:

<?php
  $string = 'To be, or not to be; that's the question?!';
  echo str_replace(
    array('.', ',', ':', ';', '!', '?'),
    '',
    $string
  );
?>

Replacing without Regular Expressions(str_replace.php)

The following code shows how to use an array with replacement characters.

<?php
  $string = '<p>This is <span class="acronym">HTML</span>!</p>';
  echo str_replace(
    array('<', '>', '"', ''', '&'),
    array('&lt;', '&gt;', '&quot;', '&apos;', '&amp;'),
    $string
  );
?>

Replacing without Regular Expressions(str_replace_multiple.php)

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.140.108