Chapter 14: Content Formatting with Regular Expressions

We’re almost there! We’ve designed a database to store jokes, organized them into categories, and tracked their authors. We’ve learned how to create a web page that displays this library of jokes to site visitors. We’ve even developed a set of web pages that a site administrator can use to manage the joke library without knowing anything about databases.

In so doing, we’ve built a site that frees the resident webmaster from continually having to plug new content into tired HTML page templates, and from maintaining an unmanageable mass of HTML files. The HTML is now kept completely separate from the data it displays. If you want to redesign the site, you simply have to make the changes to the HTML contained in the PHP templates that you’ve constructed. A change to one file (for example, modifying the footer) is immediately reflected in the page layouts of all pages in the site. Only one task still requires knowledge of HTML: content formatting.

On any but the simplest of websites, it will be necessary to allow content (in our case, jokes) to include some sort of formatting. In a simple case, this might merely be the ability to break text into paragraphs. Often, however, content providers will expect facilities such as bold or italic text, hyperlinks, and so on.

As it stands, we’ve stripped out any formatting from text entered by users using the htmlspecialchars function.

If, instead, we just echo out the raw content pulled from the database, we can enable administrators to include formatting in the form of HTML code in the joke text:

<?php echo $joke->joketext; ?>
                

Following this simple change, a site administrator could include HTML tags that would have their usual effect on the joke text when inserted into a page.

But is this really what we want? Left unchecked, content providers can do a lot of damage by including HTML code in the content they add to your site’s database. Particularly if your system will be enabling nontechnical users to submit content, you’ll find that invalid, obsolete, and otherwise inappropriate code will gradually infest the pristine website you set out to build. With one stray tag, a well-meaning user could tear apart the layout of your site.

In this chapter, you’ll learn about several PHP functions that you haven’t seen before, which are used for finding and replacing patterns of text in your site’s content. I’ll show you how to use these capabilities to provide a simpler markup language for your users that’s better suited to content formatting. By the time we’ve finished, we’ll have completed a content management system that anyone with a web browser can use—no knowledge of HTML required.

Regular Expressions

To implement our own markup language, we’ll have to write some PHP code to spot our custom tags in the text of jokes and then replace them with their HTML equivalents. For tackling this sort of task, PHP includes extensive support for regular expressions.

A regular expression is a short piece of code that describes a pattern of text that may occur in content like our jokes. We use regular expressions to search for and replace patterns of text. They’re available in many programming languages and environments, and are especially prevalent in web development languages like PHP.

The popularity of regular expressions has everything to do with how useful they are, and absolutely nothing to do with how easy they are to use—because they’re not at all easy. In fact, to most people who encounter them for the first time, regular expressions look like what might eventuate if you fell asleep with your face on the keyboard.

Here, for example, is a relatively simple (yes, really!) regular expression that will match any string that might be a valid email address:

/^[w.-]+@([w-]+.)+[a-z]+$/i
                

Scary, huh? By the end of this section, you’ll actually be able to make sense of that.

The language of a regular expression is cryptic enough that, once you master it, you may feel as if you’re able to weave magical incantations with the code you write. To begin with, let’s start with some very simple regular expressions.

This is a regular expression that searches for the text “PHP” (without the quotes):

/PHP/
                

Fairly simple, right? It’s the text you want to search for, surrounded by a pair of matching delimiters. Traditionally, slashes (/) are used as regular expression delimiters, but another common choice is the hash character (#). You can actually use any character as a delimiter except letters, numbers, or backslashes (). I’ll use slashes for all the regular expressions in this chapter.

Escaping Delimiters

To include a forward slash as part of a regular expression that uses forward slashes as delimiters, you must escape it with a preceding backslash (/); otherwise, it will be interpreted as marking the end of the pattern.

The same goes for other delimiter characters: if you use hash characters as delimiters, you’ll need to escape any hashes within the expression with backslashes (#).

To use a regular expression, you must be familiar with the regular expression functions available in PHP. preg_match is the most basic, and can be used to determine whether or not a regular expression is matched by a particular text string.

Consider this code:

<?php
$text = 'PHP rules!';

if (preg_match('/PHP/', $text)) {
    echo '$text contains the string "PHP".';
} else {
    echo '$text does not contain the string "PHP".';
}
                

In this example, the regular expression finds a match, because the string stored in the variable $text contains “PHP”. This example will therefore output the message shown below.

The regular expression finds a match

The regular expression finds a match

Use of Single Quotes Above

Notice that the single quotes around the strings in the code prevent PHP from filling in the value of the variable $text.

By default, regular expressions are case-sensitive. That is, lowercase characters in the expression only match lowercase characters in the string, and uppercase characters only match uppercase characters. If you want to perform a case-insensitive search instead, you can use a pattern modifier to make the regular expression ignore case.

Pattern modifiers are single-character flags following the ending delimiter of an expression. The modifier for performing a case-insensitive match is i. So while /PHP/ will only match strings that contain “PHP”, /PHP/i will match strings that contain “PHP”, “php”, or even “pHp”.

Here’s an example to illustrate this:

<?php
$text = 'What is Php?';

if (preg_match('/PHP/i', $text)) {
    echo '$text contains the string "PHP".';
} else {
    echo '$text does not contain the string "PHP".';
}
                

Again, as shown below, this outputs the same message, despite the string actually containing “Php”.

No need to be picky …

No need to be picky …

Regular expressions are almost a programming language unto themselves. A dazzling variety of characters have a special significance when they appear in a regular expression. Using these special characters, you can describe in great detail the pattern of characters that a PHP function like preg_match will search for. To show you what I mean, let’s look at a slightly more complex regular expression:

/^PH.*/
                

The caret (^) is placed at the beginning of an expression and indicates that the pattern much match the start of the string. The expression above will only match strings that start withPH.

The dot (.) means “any single character”. The expression /PH./ would match PHP, PHA, PHx and any other string that started with PH and one more letter.

The asterisk (*) is a modifier for the dot, and it means “zero or more of the preceding character”. The expression P* would match PPPPPPP but not PHP.

.* matches any character zero or more times.

Therefore, the pattern /^PH.*/ matches not only the string “PH”, but “PHP”, “PHX”, “PHP: Hypertext Preprocessor”, and any other string beginning with “PH”.

When you first encounter it, regular expression syntax can be downright confusing and difficult to remember, so if you intend to make extensive use of it, a good reference might come in handy. Regular expressions are a complex and extensive mini-language. I’m not going to try to cover it here. Instead, I’ll introduce the individual characters as we need them. The PHP Manual includes a very thorough regular expression reference,, and interactive tools such as regex101.com are incredibly useful visual learning tools.

String Replacement with Regular Expressions

As you may recall, we’re aiming in this chapter to make it easier for non-HTML-savvy users to add formatting to the jokes on our website. For example, if a user puts asterisks around a word in the text of a joke—such as 'Knock *knock*…'—we’d like to display the joke with HTML emphasis tags around that word: Knock <em>knock</em>…'.

We can detect the presence of plain-text formatting such as this in a joke’s text using preg_match with the regular expression syntax we’ve just learned. However, what we need to do is pinpoint that formatting and replace it with appropriate HTML tags. To achieve this, we need to look at another regular expression function offered by PHP: preg_replace.

preg_replace, like preg_match, accepts a regular expression and a string of text, and attempts to match the regular expression in the string. In addition, preg_replace takes another string of text and replaces every match of the regular expression with that string.

The syntax for preg_replace is as follows:

$newString = preg_replace($regExp, $replaceWith, $oldString);
                

Here, $regExp is the regular expression, and replaceWith is the string that will replace matches in $oldString. The function returns the new string with all the replacements made. In that code, this newly generated string is stored in $newString.

We’re now ready to build our joke formatting function.

Emphasized Text

We could use a relevant preg_replace method everywhere it’s required in our templates. However, since this is going to be useful in multiple places, and any website we build, we’ll create a class for it and place it in our Ninja namespace:

namespace Ninja;

class Markdown {
    private $string;

    public function __construct($markDown) {
    $this->string = $markDown;
    }

    public function toHtml() {
    // convert $this->string to HTML

    return $html;
    }
}
                

The plain-text formatting syntax we’ll support is a simplified form of Markdown, created by John Gruber.

Markdown is a text-to-HTML conversion tool for web writers. Markdown allows you to write using an easy-to-read, easy-to-write plain-text format, then convert it to structurally valid XHTML (or HTML). — the Markdown home page

Since this class will convert Markdown to HTML, it’s named MarkDown.

This first action is to use the htmlentities function to convert any HTML code present in the text into text, by removing any characters that are understood by browsers (<, >, &, "). We want to avoid any HTML code appearing in the output, except that which is generated from plain-text formatting. Technically, this breaks one of the features of Markdown: support for inline HTML. “Real” Markdown can contain HTML code, which will be passed through to the browser untouched. The idea is that you can use HTML to produce any formatting that’s too complex to create using Markdown’s plain-text formatting syntax. Since we don’t want to allow this, it might be more accurate to say we’ll support Markdown-style formatting.

Let’s start with formatting that will create bold and italic text.

In Markdown, you can emphasize text by surrounding it with a pair of asterisks (*), or a pair of underscores (_). Obviously, we’ll replace any such pair with <em> and </em> tags.You may be more accustomed to using <b> and <i> tags for bold and italic text. However, I’ve chosen to respect the most recent HTML standards, which recommend using the more meaningful strong and em tags, respectively. If bold text doesn’t necessarily indicate strong emphasis in your content, and italic text isn’t representative of emphasis, you might want to use b and i instead.

To achieve this, we’ll use two regular expressions: one that handles a pair of asterisks, and one that handles a pair of underscores.

Let’s start with the underscores:

/_[^_]+_/
                

Breaking this down:

/: we choose our usual slash character to begin (and therefore delimit) our regular expression.

_: there’s nothing special about underscores in regular expressions, so this will simply match an underscore character in the text.

[^_]: square brackets are used to match a sequence of one or more characters that are placed between the opening bracket [ and closing bracket ]. The caret (^), when placed inside square brackets, acts as a logical not. The expression [^_] will match any character that is not an underscore.

+: the plus character indicates one or more characters that match the preceding expression. [^_]+ can be read as one or more characters that are not an underscore.

_: the second underscore, which marks the end of the italicized text.

/: the end of the regular expression.

In English, the expression /_[^_]+_/ could be translated as: “Find an underscore, followed by one or more characters that aren’t an underscore, and stop at the following underscore.”

Now, it’s easy enough to feed this regular expression to preg_replace, but we have a problem:

                $text = preg_replace('/_[^_]+_/', '<em>emphasized text</em>', $text);
                

The second argument we pass to preg_replace needs to be the text that we want to replace each match with. The problem is, we have no idea what the text that goes between the <em> and </em> tags should be. It’s part of the text that’s being matched by our regular expression!

Thankfully, another feature of preg_replace comes to our rescue. If you surround a portion of the regular expression with parentheses, you can capture the corresponding portion of the matched text and use it in the replacement string. To do this, you’ll use the code $n, where n is 1 for the first parenthesized portion of the regular expression, 2 for the second, and so on, up to 99 for the 99th. Consider this example:

$text = 'banana';
$text = preg_replace('/(.*)(nana)/', '$2$1', $text);
echo $text; // outputs 'nanaba'
                

So $1 is replaced with the text matched by the first grouped portion of the regular expression ((.*)—zero or more non-newline characters), which is ba in this case. $2 is replaced by nana, which is the text matched by the second grouped portion of the regular expression ((nana)). The replacement string '$2$1', therefore, produces 'nanaba'.

We can use the same principle to create our emphasized text, adding parentheses to our regular expression:

/_([^_]+)_/
                

These parentheses have no effect on how the expression works at all, but they create a group of matched characters that we can reuse in our replacement string:

$text = preg_replace('/_([^_]+)_/', '<em>$1</em>', $text);
                

The pattern to match and replace pairs of asterisks looks much the same, except we need to escape the asterisks with backslashes, since the asterisk character normally has a special meaning in regular expressions:

$text = preg_replace('/*([^*]+)*/', '<em>$1</em>', $text);
                

That takes care of emphasized text, but Markdown also supports creating strong emphasis (<strong> tags) by surrounding text with a pair of double asterisks or underscores (**strong emphasis** or __strong emphasis__). Here’s the regular expression to match pairs of double underscores:

/__(.+?)__/s
                

The double underscores at the start and end are straightforward enough, but what’s going on inside the parentheses?

Previously, in our single-underscore pattern, we used [^_]+ to match a series of one or more characters, none of which could be underscores. That works fine when the end of the emphasized text is marked by a single underscore. But when the end is a double underscore, we want to allow for the emphasized text to contain single underscores (for example, __text_with_strong_emphasis__). “No underscores allowed,” therefore, won’t cut it: we must come up with some other way to match the emphasized text.

You might be tempted to use .+ (one or more characters, any kind), giving us a regular expression like this:The s pattern modifier at the end of the regular expression ensures that the dot (.) will truly match any character, including newlines.

/__(.+)__/s
                

The problem with this pattern is that the + is greedy: it will cause this portion of the regular expression to gobble up as many characters as it can. Consider this joke, for example:

__Knock-knock.__ Who’s there? __Boo.__ Boo who? __Aw, don’t cry about it!__
                

When presented with this text, the regular expression above will see just a single match, beginning with two underscores at the start of the joke and ending with two underscores at the end. The rest of the text in between (including all the other double underscores) will be gobbled up by the greedy .+ as the text to be emphasized!

To fix this problem, we can ask the + to be non-greedy by adding a question mark after it. Instead of matching as many characters as possible, .+? will match as few characters as possible while still matching the rest of the pattern, ensuring we’ll match each piece of emphasized text (and the double-underscores that surround it) individually. This gets us to our final regular expression:

/__(.+?)__/s
                

Using the same technique, we can also come up with a regular expression for double-asterisks. This is how the finished code for applying strong emphasis ends up looking:

$text = preg_replace('/__(.+?)__/s', '<strong>$1</strong>', $text);
$text = preg_replace('/**(.+?)**/s', '<strong>$1</strong>', $text);
                

One last point: we must avoid converting pairs of single asterisks and underscores into <em> and </em> tags until after we’ve converted the pairs of double asterisks and underscores in the text into <strong> and </strong> tags. Our toHtml function, therefore, will apply strong emphasis first, then regular emphasis:

namespace Ninja;

class Markdown {
    private $string;

    public function __construct($markDown) {
    $this->string = $markDown;
    }

    public function toHtml() {
    // convert $this->string to HTML
    $text = htmlspecialchars($this->string, ENT_QUOTES,
     'UTF-8');

    // strong (bold)
    $text = preg_replace('/__(.+?)__/s',
     '<strong>$1</strong>', $text);
    $text = preg_replace('/**(.+?)**/s',
     '<strong>$1</strong>', $text);

    // emphasis (italic)
    $text = preg_replace('/_([^_]+)_/',
     '<em>$1</em>', $text);
    $text = preg_replace('/*([^*]+)*/',
     '<em>$1</em>', $text);

    return $text;
    }
}
                

Paragraphs

While we could choose characters to mark the start and end of paragraphs, just as we did for emphasized text, a simpler approach makes more sense. Since your users will type the content into a form field that allows them to create paragraphs using the Enter key, we’ll take a single newline to indicate a line break (<br />) and a double newline to indicate a new paragraph (</p><p>).

As I explained earlier, you can represent a newline character in a regular expression as . Other whitespace characters you can write this way include a carriage return ( ) and a tab space ( ).

Exactly which characters are inserted into text when the user hits Enter depends on the user’s operating system. In general, Windows computers represent a line break as a carriage return followed by a newline ( ), whereas Mac computers used to represent it as a single carriage return character ( ). These days, Macs and Linux computers use a single newline character ( ) to indicate a new line.In fact, the type of line breaks used can vary between software programs on the same computer. If you’ve ever opened a text file in Notepad to see all the line breaks missing, you’ve experienced the frustration this can cause. Advanced text editors used by programmers usually let you specify the type of line breaks to use when saving a text file.

To deal with these different line-break styles, any of which may be submitted by the browser, we must do some conversion:

// Convert Windows (
) to Unix (
)
$text = preg_replace('/
/', "
", $text);

// Convert Macintosh (
) to Unix (
)
$text = preg_replace('/
/', "
", $text);
                

Avoid Using Double-Quoted String with Regular Expressions

All the regular expressions we’ve seen so far in this chapter have been expressed as single-quoted PHP strings. The automatic variable substitution provided by PHP strings is sometimes more convenient, but they can cause headaches when used with regular expressions.

Double-quoted PHP strings and regular expressions share a number of special character escape codes. " " is a PHP string containing a newline character. Likewise, / / is a regular expression that will match any string containing a newline character. We can represent this regular expression as a single-quoted PHP string ('/ /') and all is well, because the code has no special meaning in a single-quoted PHP string.

If we were to use a double-quoted string to represent this regular expression, we’d have to write "/\n/"—with a double-backslash. The double-backslash tells PHP to include an actual backslash in the string, rather than combining it with the n that follows it to represent a newline character. This string will therefore generate the desired regular expression, / /.

Because of the added complexity it introduces, it’s best to avoid using double-quoted strings when writing regular expressions. Note, however, that I have used double quotes for the replacement strings (" ") passed as the second parameter to preg_replace. In this case, we actually do want to create a string containing a newline character, so a double-quoted string does the job perfectly.

With our line breaks all converted to newline characters, we can convert them to paragraph breaks (when they occur in pairs) and line breaks (when they occur alone):

// Paragraphs
$text = '<p>' . preg_replace('/

/', '</p><p>', $text) . '</p>';

// Line breaks
$text = preg_replace('/
/', '<br>', $text);
                

Note the addition of <p> and </p> tags surrounding the joke text. Because our jokes may contain paragraph breaks, we must make sure the joke text is output within the context of a paragraph to begin with.

This code does the trick: the line breaks in the text will now become the natural line- and paragraph-breaks expected by the user, removing the requirement to learn anything new to create this simple formatting.

It turns out, however, that there’s a simpler way to achieve the same result in this case: there’s no need to use regular expressions at all! PHP’s str_replace function works a lot like preg_replace, except that it only searches for strings instead of regular expression patterns:

                $newString = str_replace($searchFor, $replaceWith, $oldString);
                

We can therefore rewrite our line-breaking code as follows:

// Convert Windows (
) to Unix (
)
$text = str_replace("
", "
", $text);
// Convert Macintosh (
) to Unix (
)
$text = str_replace("
", "
", $text);

// Paragraphs
$text = '<p>' . str_replace("

", '</p><p>', $text) . '</p>';
// Line breaks
$text = str_replace("
", '<br>', $text);
                

str_replace is much more efficient than preg_replace, because there’s no need for it to apply the complex rules that govern regular expressions. Whenever str_replace (or str_ireplace, if you need a case-insensitive search) can do the job, you should use it instead of preg_replace.

While supporting the inclusion of hyperlinks in the text of jokes may seem unnecessary, such a feature makes plenty of sense in other applications.

Here’s what a hyperlink looks like in Markdown:Markdown also supports a more advanced link syntax where you put the link URL at the end of the document, as a footnote. But we won’t be supporting that kind of link in our simplified Markdown implementation.

[linked text](link URL)
                

Simple, right? You put the text of the link in square brackets, and follow it with the URL for the link in parentheses.

As it turns out, you’ve already learned everything you need to match and replace links like this with HTML links. If you’re feeling up to the challenge, you should stop reading right here and try to tackle the problem yourself!

First, we need a regular expression that will match links of this form. The regular expression is as follows:

/[([^]]+)]((.+))/i
                

This is a rather complicated regular expression. You can see how regular expressions have gained a reputation for being indecipherable!

Squint at it for a little while, and see if you can figure out how it works. Try writing out the expression on regex101.com and it will display the regular expression in its groups with some useful highlighting. You can try typing in various strings to see which match.

Let me break it down for you:

/: as with all our regular expressions, we choose to mark its beginning with a slash.

[: this matches the opening square bracket ([). Since square brackets have a special meaning in regular expressions, we must escape it with a backslash to have it interpreted literally.

([^]]+): first of all, this portion of the regular expression is surrounded with parentheses, so the matching text will be available to us as $1 when we write the replacement string. Inside the parentheses, we’re after the linked text. Because the end of the linked text is marked with a closing square bracket (]), we can describe it as one or more characters, none of which is a closing square bracket ([^]]+).

](: this will match the closing square bracket that ends the linked text, followed by the opening parenthesis that signals the start of the link URL. The parenthesis needs to be escaped with a backslash to prevent it from having its usual grouping effect. (The square bracket doesn’t need to be escaped with a backslash, because there’s no unescaped opening square bracket currently in play.)

(.+): as URLs can contain (almost) any character, anything typed inside the markdown parentheses will be matched by .+ and stored inside the group $2 in the replacement string.

): this escaped parenthesis matches the closing parenthesis ()) at the end of the link URL.

/i: we mark the end of the regular expression with a slash, followed by the case-insensitivity flag, i.

We can therefore convert links with the following PHP code:

$text = preg_replace(
    '/[([^]]+)](([-a-z0-9._~:/?#@!$&'()*+,;=%]+))/i',
    '<a href="$2">$1</a>', $text);
                

As you can see, $1 is used in the replacement string to substitute the captured link text, and $2 is used for the captured URL.

Additionally, because we’re expressing our regular expression as a single-quoted PHP string, you have to escape the single quote that appears in the list of acceptable characters with a backslash.

Putting It All Together

Here’s how our finished class for converting Markdown to HTML looks:

<?php
namespace Ninja;

class Markdown
{
    private $string;

    public function __construct($markDown)
    {
        $this->string = $markDown;
    }

    public function toHtml()
    {
        // convert $this->string to HTML
        $text = htmlspecialchars($this->string, ENT_QUOTES,
         'UTF-8');

        // strong (bold)
        $text = preg_replace('/__(.+?)__/s', 
         '<strong>$1</strong>', $text);
        $text = preg_replace('/**(.+?)**/s',
         '<strong>$1</strong>', $text);

        // emphasis (italic)
        $text = preg_replace('/_([^_]+)_/',
         '<em>$1</em>', $text);
        $text = preg_replace('/*([^*]+)*/',
         '<em>$1</em>', $text);

        // Convert Windows (
) to Unix (
)
        $text = str_replace("
", "
",
         $text);
        // Convert Macintosh (
) to Unix (
)
        $text = str_replace("
", "
",
         $text);

        // Paragraphs
        $text = '<p>' . str_replace("

",
         '</p><p>', $text) . '</p>';
        // Line breaks
        $text = str_replace("
", '<br>', $text);

        // [linked text](link URL)
        $text = preg_replace(
    '/[([^]]+)](([-a-z0-9._~:/?#@!$&'()*+,;=%]+))/i',
    '<a href="$2">$1</a>',
        $text
    );

        return $text;
    }
}
                

We can then use this class in our template that outputs the joke text, jokes.html.php:

<div class="jokelist">

<ul class="categories">
    <?php foreach ($categories as $category): ?>
    <li><a href="/joke/list?category=
    <?=$category->id?>"><
    ?=$category->name?></a><li>
    <?php endforeach; ?>
</ul>

<div class="jokes">

<p><?=$totalJokes?> jokes have been submitted to the Internet Joke Database.</p>

<?php foreach ($jokes as $joke): ?>
<blockquote>
    <p>
    <?=htmlspecialchars($joke->joketext, 
    ENT_QUOTES, 'UTF-8')?>

    (by <a href="mailto:<?=htmlspecialchars(
    $joke->getAuthor()->email,
    ENT_QUOTES,
        'UTF-8'
); ?>">
        <?=htmlspecialchars(
            $joke->getAuthor()->name,
            ENT_QUOTES,
            'UTF-8'
        ); ?></a> on
<?php
$date = new DateTime($joke->jokedate);

echo $date->format('jS F Y');
?>)

<?php if ($user): ?>
    <?php if ($user->id == $joke->authorId || 
    $user->hasPermission(IjdbEntityAuthor::EDIT_JOKES)): ?>
    <a href="/joke/edit?id=<?=$joke->id?>">
    Edit</a>
    <?php endif; ?>
    <?php if ($user->id == $joke->authorId ||
     $user->hasPermission(IjdbEntityAuthor::DELETE_JOKES)): ?>
    <form action="/joke/delete" method="post">
        <input type="hidden" name="id" 
            value="<?=$joke->id?>">
        <input type="submit" value="Delete">
    </form>
    <?php endif; ?>
<?php endif; ?>
    </p>
</blockquote>
<?php endforeach; ?>

</div>
                

The line we’re interested in is this:

<?=htmlspecialchars($joke->joketext,
ENT_QUOTES, 'UTF-8')?>
                

However, each joke is already wrapped in a <p> tag. This can be removed:

<div class="jokelist">

<ul class="categories">
    <?php foreach($categories as $category): ?>
    <li><a href="/joke/list?category=
    <?=$category->id?>">
    <?=$category->name?></a><li>
    <?php endforeach; ?>
</ul>

<div class="jokes">

<p><?=$totalJokes?> jokes have been submitted to the Internet Joke Database.</p>

<?php foreach($jokes as $joke): ?>
<blockquote>
    <!-- Remove the opening tag <p> -->

    <?=htmlspecialchars($joke->joketext, 
    ENT_QUOTES, 'UTF-8')?>

    <!--- … -->
<?php endif; ?>
    <!-- Remove the closing tag </p> -->
</blockquote>
<?php endforeach; ?>

</div>
                

Now, replace the line that shows the joke text with this:

<?php
$markdown = new NinjaMarkdown($joke->joketext);
echo $markdown->toHtml();
?>
                

This will pass the contents of joketext to the markdown class as a constructor argument and call the toHtml method to convert the text to HTML.

This is a lot untidier than the original method, as it requires two lines. As with most things in PHP, there is a way to express this using shorter syntax:

<?=(new NinjaMarkdown($joke->joketext))->toHtml()?>
                

This code can be found in Formatting-Markdown.

With these changes made, take your new plain-text formatting for a spin! Edit a few of your jokes to contain Markdown syntax and verify that the formatting is correctly displayed.

Why Using Markdown is Cool

What’s nice about adopting a formatting syntax like Markdown for your own website is that there’s often plenty of open-source code out there to help you deal with it.

Your newfound regular expression skills will serve you well in your career as a web developer, but if you want to support Markdown formatting on your site, the easiest way to do it would be to not write all the code to handle Markdown formatting yourself!

Commonly used Markdown libraries include ParseDown and cebe/markdown.

Sorting, Limiting and Offsets

We’ve spent a lot of time writing PHP code and, thanks to the DatabaseTable class, it’s been quite some time since you learned about any new SQL.

However, there’s a few final MySQL features I’d like to show you before you get your Ninja title.

Sorting

MySQL supports asking for retrieved records in a specific order. At the moment, the Joke List page displays jokes in the order they were posted. It would be better if it showed the newest first.

A SELECT query can contain an ORDER BY clause that specifies the column that the data is sorted by.

For our jokes table, SELECT * FROM `joke` ORDER BY `jokedate` would order the jokes by the date they were posted. You can also specify a modifier of ASC (ascending-counting up) or DESC (descending-counting down).

SELECT * FROM `joke` ORDER BY `jokedate` DESC
                

This query would select all the jokes and order them by date in descending order, newest first.

Let’s implement this on the website. All our SQL queries are generated by the DatabaseTable class, so we’ll need to amend that to include an ORDER BY clause.

At the moment, the findAll method looks like this:

public function findAll() {
    $result = $this->query('SELECT * FROM ' . 
     $this->table);

    return $result->fetchAll(PDO::FETCH_CLASS, 
     $this->className, $this->constructorArgs);
}
                

Let’s add an optional argument for ORDER BY:

public function findAll($orderBy = null) {

    $query = 'SELECT * FROM ' . $this->table;

    if ($orderBy != null) {
    $query .= ' ORDER BY ' . $orderBy;
    }

    $result = $this->query($query);

    return $result->fetchAll(PDO::FETCH_CLASS,
     $this->className, $this->constructorArgs);
}
                

The SELECT query is now built up in the same way we built the INSERT and UPDATE queries. When a value for $orderBy is supplied, it’s appended to the query along with the ORDER BY clause. By making the argument optional, all of our existing code will still work without modification. We can provide a value for the $orderby argument only where it’s needed.

To sort the Joke List page by date descending, amend the Joke controller's list method to supply the argument to the findAll method:

public function list() {

    if (isset($_GET['category'])) {
        $category = $this->categoriesTable->
        findById($_GET['category']);
        $jokes = $category->getJokes();
    }
    else {
        $jokes = $this->jokesTable->findAll('jokedate DESC');
    }       

    // …
                

At the moment, the main Joke List page is sorted newest first. However, if you click on one of the categories, they’re listed oldest first.

You might consider adding the same optional argument to the find method:

public function find($column, $value, $orderBy = null) {
    $query = 'SELECT * FROM ' . $this->table . ' 
    WHERE ' . $column . ' = :value';

    $parameters = [
    'value' => $value
    ];

    if ($orderBy != null) {
        $query .= ' ORDER BY ' . $orderBy;
    }

    $query = $this->query($query, $parameters);

    return $query->fetchAll(PDO::FETCH_CLASS,
     $this->className, $this->constructorArgs);
}
                

Although this will be useful, it’s not going to solve the problem. The list of jokes is generated in the Category entity class:

public function getJokes() {
    $jokeCategories = $this->jokeCategoriesTable->
    find('categoryId', $this->id);

    $jokes = [];

    foreach ($jokeCategories as $jokeCategory) {
        $joke = $this->jokesTable->
        findById($jokeCategory->jokeId);
        if ($joke) {
            $jokes[] = $joke;
        }           
    }

    return $jokes;
}
                

Because the find method is called on the DatabaseTable instance that represents the joke_category table, we can’t easily sort by date.

There are a few ways to solve this. We could add a date column to the joke_category table for sorting purposes. We could also use an SQL JOIN, but that would be difficult to implement into our OOP DatabaseTable class.

Instead, we can do the sort in PHP itself, using the usort function. The usort function takes two arguments: an array to be sorted, and the name of a function that compares two values.

The example given in the PHP manual is this:

<?php
function cmp($a, $b)
{
    if ($a == $b) {
        return 0;
    }
    return ($a < $b) ? -1 : 1;
}

$a = [3, 2, 5, 6, 1];

usort($a, "cmp");

foreach ($a as $key => $value) {
    echo "$key: $value
";
}
                

The code above outputs this:

0: 1
1: 2
2: 3
3: 5
4: 6
                

The array has been sorted smallest to largest. The cmp function is called with two values from the array, and returns 1 if the first should be placed after the second, and -1 if the first should be placed before the second. The important part is this line:

return ($a < $b) ? -1 : 1;
                

The syntax here looks strange if you haven’t come across it before. You actually know what’s happening here, but you’ve not seen it expressed in this way. The code here is a shorthand (or ternary) if statement, and it’s identical in execution to this:

if ($a < $b) {
    return -1;
} else {
    return 1;
}
                

The comparison function can take arguments that are objects, and we can build a comparison function into our Category class like so:

public function getJokes() {
    $jokeCategories = $this->jokeCategoriesTable->
    find('categoryId', $this->id);

    $jokes = [];

    foreach ($jokeCategories as $jokeCategory) {
    $joke =  $this->jokesTable->
     findById($jokeCategory->jokeId);
    if ($joke) {
        $jokes[] = $joke;
    }           
    }

    usort($jokes, [$this, 'sortJokes']);

    return $jokes;
}

private function sortJokes($a, $b) {
    $aDate = new DateTime($a->jokedate);
    $bDate = new DateTime($b->jokedate);

    if ($aDate->getTimestamp() == $bDate->getTimestamp()) {
    return 0;
    }

    return $aDate->getTimestamp() > $bDate->getTimestamp() ? -1 : 1;
}
                

You can find this code in Formatting-Usort

There’s a lot going on here, so I’ll go through it line by line. Firstly, the $jokes array is sorted using the usort function: usort($jokes, [$this, 'sortJokes']);. To call a method in a class, rather than just a function, you can use an array containing the object you want to call the method on (in our case, the same instance, $this) and the name of the method to be called (sortJokes).

The sortJokes method starts by converting the dates from each of the $a and $b objects into DateTime instances for easier comparison. The getTimestamp method returns a Unix timestamp—the number of seconds between the January 1 1970 and the date being represented. Using timestamps allows us to compare the dates as integers.

The if statement checks to see if the dates have the same timestamp. If so, it returns 0, indicating that neither should be moved before or after the other in the sorted list.

If the dates are different, either 1 or -1 is returned to sort the dates. Notice I’ve used $a > $b, which will sort the array in the opposite order to the example, and put the larger timestamps (later dates) first.

There’s a slight performance overhead in using usort instead of ORDER BY and having the database perform the sort, but unless you’re dealing with thousands of records, the difference between the two will be milliseconds at worst!

Pagination with LIMIT and OFFSET

Now that you know how to sort the records, we can think a little about scalability. You’ve probably got fewer than a dozen jokes in your database at the moment. What will happen after the website has been online a few months and is starting to get popular? You might get users coming on and posting hundreds of jokes a day.

It won’t take long before the Joke List page takes a very long time to load because it’s displaying hundreds or thousands of jokes. The performance alone will put users off, but nobody is going to sit and read through a page of two thousand jokes.

A common approach is using pagination to display a sensible number—for example, ten jokes per page—and allow clicking a link to move between pages.

Before you continue, add at least 21 jokes to your database so we can test this correctly. Alternatively, for testing purposes, change 10 in the following sections to 2 to display two jokes per page.

What If I Don’t Know any Jokes?

Don’t worry if you can’t think of any jokes. Just add some test data like “joke one”, “joke two”, “joke three”, etc.

Our first task is to display just the first ten jokes. Using SQL, this is incredibly easy. The LIMIT clause can be appended to any SELECT query to restrict the number of records returned:

SELECT * FROM `joke` ORDER BY `jokedate DESC` LIMIT 10
                

We’ll need to build this into findAll and find methods of the DatabaseTable class as optional parameters, as we did with the $orderBy variable:

public function find($column, $value, $orderBy = null,
 $limit = null) {
    $query = 'SELECT * FROM ' . $this->table . ' 
    WHERE ' . $column . ' = :value';

    $parameters = [
    'value' => $value
    ];

    if ($orderBy != null) {
        $query .= ' ORDER BY ' . $orderBy;
    }

    if ($limit != null) {
        $query .= ' LIMIT ' . $limit;
    }

    $query = $this->query($query, $parameters);

    return $query->fetchAll(PDO::FETCH_CLASS,
     $this->className, $this->constructorArgs);
}

public function findAll($orderBy = null, $limit = null) {
    $query = 'SELECT * FROM ' . $this->table;

    if ($orderBy != null) {
        $query .= ' ORDER BY ' . $orderBy;
    }

    if ($limit != null) {
        $query .= ' LIMIT ' . $limit;
    }

    $result = $this->query($query);

    return $result->fetchAll(PDO::FETCH_CLASS,
     $this->className, $this->constructorArgs);
}
                

Then, to limit to ten jokes, open up the Joke controller class and provide the value 10 for the new $limit argument:

$jokes = $this->jokesTable->findAll('jokedate DESC', 10);
                

Also supply the new limit in the Category entity class:

$jokeCategories = $this->jokeCategoriesTable->find('categoryId', $this->id, null, 10);
                

You’ll notice I’ve supplied null for the $orderBy argument. Even though the argument is optional, to provide a value for $limit, a value for all the earlier arguments must be provided.

With that in place, you’ll see only ten jokes on the Joke List page. The problem now is how we’ll view the rest of the jokes!

The solution is to have different pages that can be accessed by a $_GET variable: /joke/list?page=1 or /joke/list?page=2 to select which page to show. Page 1 will show jokes 1–10, page 2 will show jokes 11–20, and so on.

Before doing anything using the page $_GET variable, let’s create the links in the template. We can easily use a for loop to display a set of links to different pages:

for ($i = 1; $i <= 10; $i++) {
    echo '<a href="/joke/list?page=' . $i . '">' . 
    $i '</a>';    
}
                

The problem is, we need to know how many pages there will be. It’s actually very easy to work out. If we’re displaying ten jokes per page, the number of pages is the number of jokes in the database divided by ten, and then rounded up.

With 21 jokes in the system, 21/10 is 2.1, and if we round up, it gives 3 pages. PHP’s ceil function can be used to round up any decimal number.

The template already has access to the $totalJokes variable, so we can display the pages at the end of jokes.html.php:

// …
<?php endif; ?>
</blockquote>
<?php endforeach; ?>

Select page:

<?php
// Calculate the number of pages
$numPages = ceil($totalJokes/10);

// Display a link for each page
for ($i = 1; $i <= $numPages; $i++):
?>
    <a href="/joke/list?page=<?=$i?>">
    <?=$i?></a>
<?php endfor; ?>

</div>
                

If you click the links, the $_GET variable will be set. It’s now just a matter of using it to display different sets of jokes.

The SQL clause OFFSET can be used with LIMIT to do exactly what we want:

SELECT * FROM `joke` ORDER BY `jokedate` LIMIT 10 OFFSET 10
                

This query will return 10 jokes, but instead of returning the first ten jokes, it will display ten jokes starting from joke 10.

We’ll need to turn page numbers into offsets. Page 1, will be OFFSET 0, page 2 will be OFFSET 10, page 3 will be OFFSET 20 and so on. This is a simple calculation: $offset = ($_GET['page']-1)*10.

As we did with limit, let’s add OFFSET as an optional argument for the findAll and find methods:

public function findAll($orderBy = null, $limit = null,
 $offset = null) {
    $query = 'SELECT * FROM ' . $this->table;

    if ($orderBy != null) {
        $query .= ' ORDER BY ' . $orderBy;
    }

    if ($limit != null) {
        $query .= ' LIMIT ' . $limit;
    }

    if ($offset != null) {
        $query .= ' OFFSET ' . $offset;
    }

    $result = $this->query($query);

    return $result->fetchAll(PDO::FETCH_CLASS,
     $this->className, $this->constructorArgs);
}

public function find($column, $value,
  $orderBy = null, $limit = null, $offset = null) {
    $query = 'SELECT * FROM ' . $this->table . '
     WHERE ' . $column . ' = :value';

    $parameters = [
    'value' => $value
    ];

    if ($orderBy != null) {
        $query .= ' ORDER BY ' . $orderBy;
    }

    if ($limit != null) {
        $query .= ' LIMIT ' . $limit;
    }

    if ($offset != null) {
        $query .= ' OFFSET ' . $offset;
    }

    $query = $this->query($query, $parameters);

    return $query->fetchAll(PDO::FETCH_CLASS,
     $this->className, $this->constructorArgs);
}
                

Then supply the offset in the list method in the Joke controller:

$page = $_GET['page'] ?? 1;

$offset = ($page-1)*10;

if (isset($_GET['category'])) {
    $category = $this->categoriesTable->
    findById($_GET['category']);
    $jokes = $category->getJokes();
}
else {
    $jokes = $this->jokesTable->findAll('jokedate DESC',
     10, $offset);
}       

$title = 'Joke List';

$totalJokes = $this->jokesTable->total();

$author = $this->authentication->getUser();

return ['template' => 'jokes.html.php',
    'title' => $title,
    'variables' => [
        'totalJokes' => $totalJokes,
        'jokes' => $jokes,
        'user' => $author,
        'categories' => $this->categoriesTable->findAll()
        ]
    ];
}
                

Now, if you click between the different page links, you’ll see ten different jokes on each page.

This new pagination doesn’t work for the lists within categories, and we’ll fix that shortly. But at the moment, the links aren’t very user friendly. Let’s change the styling of the link that represents the current page.

We can pass the number of the current page to the template:

return ['template' => 'jokes.html.php',
    'title' => $title,
    'variables' => [
    'totalJokes' => $totalJokes,
    'jokes' => $jokes,
    'user' => $author,
    'categories' => $this->categoriesTable->findAll(),
    'currentPage' => $page
    ]
];
                

Then add a CSS class to the page link if it’s the current page:

Select page:

<?php

$numPages = ceil($totalJokes/10);

for ($i = 1; $i <= $numPages; $i++):
    if ($i == $currentPage):
?>
    <a class="currentpage" 
        href="/joke/list?page=<?=$i?>">
        <?=$i?></a>
<?php else: ?>
    <a href="/joke/list?page=<?=$i?>">
    <?=$i?></a>
<?php endif; ?>
<?php endfor; ?>

</div>
                

I’ve added a CSS class currentpage to the link if the link being printed is the current page being viewed. Add some CSS to jokes.css to make the link stand out. You could change the color, make it bold, underlined or however you like. I’ve chosen to surround the number with square brackets:

.currentpage:before {
    content: "[";
}
.currentpage:after {
    content: "]";
}
                

You can find this code in Formatting-Pagination

Pagination in Categories

We have a small bug in the code at the moment. If you click on one of the categories, it won’t supply the correct offset value.

To fix this, we can add an $offset argument to the Category entity’s getJokes method. While you’re there, to improve flexibility you may as well supply $limit as an argument as well, instead of hardcoding it in the method:

public function getJokes($limit = null, $offset = null) {
    $jokeCategories = 
     $this->jokeCategoriesTable->find('categoryId', 
     $this->id, null, $limit, $offset);

    $jokes = [];

    foreach ($jokeCategories as $jokeCategory) {
        $joke =  
        $this->jokesTable->findById($jokeCategory->jokeId);
        if ($joke) {
            $jokes[] = $joke;
        }
    }

    usort($jokes, [$this, 'sortJokes']);

    return $jokes;
}
                

Then provide the values when the method is called in the list method:

if (isset($_GET['category'])) {
    $category = 
    $this->categoriesTable->findById($_GET['category']);
    $jokes = $category->getJokes(10, $offset);
}
                

With that done, the pagination will work … kind of. You can manually enter the $_GET variables in the URL—for example, http://192.168.10.10/joke/list?category=1&page=1. However, the links we created don’t work.

There are two problems:

  1. The page links don’t include the category variable.
  2. The number of page links displayed is based on the total number of jokes in the database, not the number of jokes in the selected category.

Let’s fix these one at a time. The easiest task is providing the category in the link. In the list method, we’ll need to pass the category to the template:

return ['template' => 'jokes.html.php',
    'title' => $title,
    'variables' => [
        'totalJokes' => $totalJokes,
        'jokes' => $jokes,
        'user' => $author,
        'categories' => $this->categoriesTable->findAll(),
        'currentPage' => $page,
        'category' => $_GET['category'] ?? null
        ]
    ];
                

Then amend the links in the template to provide the category variable if required:

Select page:

<?php

$numPages = ceil($totalJokes/10);

for ($i = 1; $i <= $numPages; $i++):
    if ($i == $currentPage):
?>
    <a class="currentpage" 
        href="/joke/list?page=<?=$i?>
        <?=!empty($categoryId) ? 
        '&category=' . $categoryId : '' ?>">
        <?=$i?></a> 
<?php else: ?> 
    <a href="/joke/list?page=<?=$i?>
    <?=!empty($categoryId) ? 
    '&category=' . $categoryId : '' ?>">
    <?=$i?></a>
<?php endif; ?>
<?php endfor; ?>

</div>
                

I’ve used the shorthand if, which I displayed earlier to append &category=$categoryId to the link if it’s set.

We’ve fixed the first problem, but the number of page links being displayed is still calculated based on the number of jokes in the entire table, rather than just a category.

At the moment, the total method in the DatabaseTable class returns the total number of records in a given table. To count a subset of the records, it will need a WHERE clause. We can implement it in the same way as the find method:

public function total($field = null, $value = null) {
    $sql = 'SELECT COUNT(*) FROM `' . $this->table . '`';
    $parameters = [];

    if (!empty($field)) {
    $sql .= ' WHERE `' . $field . '` = :value';
    $parameters = ['value' => $value];
    }

    $query = $this->query($sql, $parameters);

    $row = $query->fetch();
    return $row[0];
}
                

The total method now supports doing something like echo $this->jokesTable->total('authorId', 4);, which would give us the total number of jokes by the author with the id of 4.

We can’t do the same to count the number of jokes in a category, as there’s no categoryId column in the joke table. We need to call the total method on the jokeCategoriesTable instance: $this->jokeCategoriesTable->total('categoryId', 2);, which would count the number of jokes in the category with the id of 2.

Instead of implementing this in the list method, let’s add a new method to the Category entity class that returns the number of jokes in that particular category: $totalJokes = $category->getNumJokes();:

public function getNumJokes() {
    return $this->jokeCategoriesTable->total('categoryId',
     $this->id);
}
                

This can then be called from the list method in the Joke controller:

public function list() {

    $page = $_GET['page'] ?? 1;

    $offset = ($page-1)*10;

    if (isset($_GET['category'])) {
        $category = 
        $this->categoriesTable->findById($_GET['category']);
        $jokes = $category->getJokes(10, $offset);
        $totalJokes = $category->getNumJokes();
    }
    else {
        $jokes = $this->jokesTable->findAll('jokedate DESC',
        10, $offset);
        $totalJokes = $this->jokesTable->total();
    }       

    $title = 'Joke List';
    // …
                

You can find this code in Final-Website.

Notice that I’ve moved the original $totalJokes variable into the else branch of the if statement. When a category is selected, $totalJokes is the total number of jokes in the selected category. When no category has been chosen, $totalJokes stores the total number of jokes in the database.

Achievement Unlocked: Ninja

That’s it! You’re done, and you get your PHP black belt.

In this chapter, I showed you some additional tools that will be useful when you develop your next website. You have a basic understanding of regular expressions, along with the SQL features of LIMIT and OFFSET, and you know how to combine them to paginate data sets.

You now have all the tools you need to build a real website. You know how to think about writing code, and you know how to separate out project specific code from the code you can use in future projects. You also have an understanding of the concepts behind PHP frameworks. You can jump into Symfony, Zend or Laravel, and although the code will be different, all the concepts you’ve learned in this book will be familiar.

What Next?

You have all the tools you need to build a fully functional PHP website and put it live on the web. Go ahead and publish your first website. It’s a great feeling!

With programming, there’s always more to learn. There are different techniques and approaches you can try out, and a lot of different tools that will help you develop more efficiently and reduce bugs.

You’re never “done”, you never complete the game. You just keep playing. Each time you learn something new, be it a new tool, a new technique, or even a new language, it will extend what you knew before and you’ll wonder how you ever coped without it. Things also change constantly, and it’s difficult to keep up. Why do you think we’re on the sixth edition of this book? Don’t be disappointed: learning is fun, and as long as you don’t fall into the trap of thinking you know everything, you’ll go a long way.

Now that you’ve finished this book, you do, however, have more than enough knowledge to work on your own projects, or even get a job as a junior PHP developer!

Before taking the next few steps, I recommend getting at least two or three projects finished to ensure you’re comfortable with everything from this book. It won’t sink in right away, and as you go forward, you’ll find yourself solving different sets of problems. It will take you a few attempts to get everything clear in your mind.

Once you’ve done that, you can move on to the next few steps. What are those steps?

  1. Composer. Composer is a package management tool that’s used by almost all PHP projects these days. If you want to use someone else’s code in your project, you’ll need to know how to use Composer.

  2. Take a look at some PHP frameworks to see how other people do things. In 2017, I’d recommend Laravel and Symfony as starting points, but that’s likely to change within a few years.

  3. PHPUnit. Test-driven development has really taken off in PHP over the last few years, and for good reason. Once you start using TDD, it’s difficult to go back. Everything seems so much tidier and easier. Rather than having to load up your website, fill in your form, then check the record was inserted into a database, you can just run a script that does all that for you!

  4. Git. Git is a vital tool for software developers. You may have come across the website GitHub, which allows sharing code and collaborating with other developers. To use the site, you’ll need to understand git. But at its most basic, it’s an incredible tool. No more copy/pasting code after making a change, or commenting out large sections. Just delete it, and git will keep track of any changes you make!

With that said, there’s little else I need to add. However you proceed from this point, rest assured you’re starting out with a solid grounding in the essentials and a good understanding of the tools and techniques used by modern PHP websites. That’s more than can be said for many developers working today. Take that advantage and use it.

Most importantly, go out there and write some code!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.189.22.136