3. Web Programming Basics

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 3. Web Programming Basics

kmakice Feeling a little shaky about the whole coding thing? Take a crash course in the languages you will use to build your Twitter app.

As demonstrated in the first two chapters, Twitter is a rich playground for both members and developers. Before we get into specifics on how you can join in the fun, we should spend some time reviewing the mechanics of working with the languages you will be asked to use to build your new Twitter web application.

This chapter provides an overview of the basic knowledge and tools needed to create the applications described later in the book. Although the API can be used to create desktop and mobile applications as well, this book focuses on the web platform built with PHP and MySQL. Even for experienced programmers, it won’t hurt to skim this chapter so you understand the scope of what is to come. However, if you are confident in your skills with XML, CSS, PHP, and MySQL, you can skip to Chapter 4 and jump right into the methods available in the Twitter API.

Note

This chapter reflects what is needed to build and install the suite of sample applications to get you started using the Twitter API. It is not meant to be a replacement for resources dedicated to improving individual skill sets for XHTML, CSS, PHP, MySQL, or server management. Suggested reading on those topics can be found at the end of this chapter.

XHTML

The extra “X” may look intimidating, but Extensible Hypertext Markup Language (XHTML) isn’t much different from regular HTML. The main purpose of the revision is to help make web pages better supported by paying more attention to the structure of the data. Whereas HTML did not seek to separate data from presentation, XHTML puts the burden of presentation on the stylesheets.

HTML is not XML. Compliance with XML means that the tags need to be “well-formed,” or follow the conventions required by documents of that type. These conventions include use of lowercase in tag names (XML is case-sensitive) and the insistence that all tags are closed, either explicitly with a second tag (<p></p>) or implicitly as part of a singleton tag (<br />). Attribute values also have to be enclosed in quotes, either single or double. In HTML, you can omit the quotes around integer and Boolean values and the browser will still understand them.

The advantage of XHTML crystallizes when using XML tools. For example, XSLT is used for transforming documents, and XForms lets you edit XML documents in simple ways. Integration with other tools, such as MathML, SMIL, and SVG, is not possible with regular HTML. Of the three versions of XHTML—Strict, Transitional, and Frameset—Strict is the one to use. A lot of problematic tags are deprecated, so using Strict makes your code better prepared for future standards. It also makes it easier for many different devices to properly interpret the page content, since there are fewer unexpected tags and attributes to encounter.

Note

The easiest way to ensure strict compliance is to use the World Wide Web Consortium’s (W3C) validator, at http://validator.w3.org. Many text editors also have validation and formatting tools built in.

The rest of the material in this section is pretty straightforward. I include it only to make sure we are all on the same (web) page when it comes to the markup.

Web Pages

Open up any XHTML-friendly text editor (such as TextMate), and you are likely to find a template for a web page that resembles Example 3-1. As soon as it is created, you can forget about these tags; they aren’t useful outside of the page definition.

Example 3-1. Sample XHTML for a web page

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <style type="text/css" media="all">@import url(my_styles.css);</style>
    <title>The Basic Web Page</title>
</head>
<body>
This is the stuff you can see.
</body>
</html>

The stuff that is meant to be visible to the person visiting your website is located between the <body></body> tags. After changing details such as the name and location of your external CSS file and the title of the page, the tags to focus on are the ones that will describe the content you want to display. Here is some information about the types of tags used in the sample applications for this book:

Block tags

These elements have the inherent property of creating a new line of content when displayed, forming discrete blocks of content when rendered by the browser. The most general block element is the <div> tag, which is primarily used to identify each section using the unique id attribute. The stuff contained within a given division is isolated from the rest of the page and styled according to the attributes of the enclosing <div> tag.

More familiar forms of block elements include the heading tags (<h1>, <h2>, etc.) and <p>. Each has specific properties, such as text size and weight (boldness), for which you can specify values other than the defaults. The break tag, <br />, is sometimes considered a block element too, because it forces a new line to be rendered in the page display.

Example: <div id="me"><p class="others"><br /></p></div>

Inline tags

For the content contained within block elements, different words may require different presentation. XHTML allows for some special inline elements to handle this without explicitly defining a style. The two most commonly used elements are <strong> and <em>. By default, the former makes text boldface, and the latter displays text in italics.

Example: This is <strong>boldface</strong>.

List tags

List tags are a special category of XHTML element, sort of a cross between block and inline elements. Wrapping content in list tags will force a line break (as with a block element), but these tags are also used to indent or mark specific lines of text. There are tags for three kinds of lists: unordered lists (<ul>), ordered lists (<ol>), and definition lists (<dl>). Which tags you use dictates how the list items are presented.

List tags are nested. Each item in an ordered or unordered list must be wrapped in <li></li> tags, and the collection of all the list items must be wrapped in <ul></ul> or <ol></ol> tags. (Definition lists are a special case that we won’t go into here, as the sample applications in this book don’t use definition tags.) Unordered lists place bullet characters before each item. Ordered lists use a sequence of numbers, letters, or Roman numerals.

Example: <ul><li>tom</li><li>jerry</li></ul>

Hyperlink tags

Arguably the most important tag in the bag of XHTML tricks is the hyperlink tag. Without it, web pages would have to be bookmarked and loaded separately. Linking is what makes Google searches work.

You can make any text, image, or object an active hyperlink by wrapping it in anchor tags (<a></a>). Anchors can be used to point to somewhere new, using the href attribute and a referenced web document, and to name a particular bit of content. Naming comes in handy to connect XHTML to JavaScript or links from other pages.

Another attribute, title, further empowers links. The title attribute can store useful descriptive or instructive information that becomes visible in many browsers when you move your mouse over the hyperlink.

Example: <a href="#" title="hi">click<a>

Image tags

Adding images to your web page may slow down the page loading a bit, but it also makes the experience more enjoyable for visually perceptive visitors. The bandwidth and access limitations of a decade ago, which discouraged the use of images on web pages, are disappearing, as most U.S. homes now have broadband access.^[60] Graphics—whether they’re big pictures dominating the page or a few custom icons used for bullet points in a list—enhance communication and enjoyment.

Note

Even with faster connections becoming the norm, it is still good practice to be aware of what images and other media do to bandwidth. Optimize your images for delivery on the Web by using PNG or GIF screenshots, or by compressing JPEG photos.

The <img /> tag—a singleton that does not have a matching closing tag—has an attribute, src, that points to a file accessible via the Web. The file can be in any one of a number of formats, but for graphics the most acceptable formats are .gif, .jpeg, and .png.

Although most browsers don’t require it, image tags also need an alt attribute to be compliant with W3C standards. Alternative text has two important purposes. First, before the browser successfully downloads the image, it uses the text in the alt attribute as a placeholder in the page display. More importantly, alternative text is used by web browsers for visually impaired people to describe the images those visitors cannot see. Alt text can be blank for images meant for decoration only, but otherwise, the value should be a short and meaningful description of the image.

The other attributes that are helpful to include are width and height. These default to the actual pixel size of the image file if not included, but specifying them in the code allows the browser to carve out the appropriate bit of screen real estate while the image loads. The rest of the page is then unaffected by when the graphic decides to show up. These attributes also allow for resizing a graphic on the fly, as their values will override the dimensions of the image for purposes of rendering the web page.

Example: <img src="hi.jpg" alt="Hi" width="8" height="8" />

Warning

Using the width and height attributes of the image tag will only resize the display, not the file size. That 10-megapixel shot from your new digital camera will still slowly load into the browser as a giant 4-megabyte file!

Form tags

There is an important group of tags that deserves greater mention. Without them, a web page would not be able to gather information from visitors and respond in a relevant manner. Pointing and clicking on links is fine, but forms are what put the interaction in the Web.

Forms allow people to enter data into a web page and send that data back to the host server. Every blog or wiki uses a form to let people publish information. Every online purchase relies on a web form to manage the transaction. Even a simple Google search consists of an input element and a submit button. Here’s a list of the form tags used in the sample applications:

<form>

To let the browser know that the aforementioned kind of interaction is permissible, you need to define an area of the web page as a web form. This is done by wrapping content in <form></form> tags and specifying a couple of key attributes.

The action attribute points to the destination server and file that are expecting the form data. This can be an email or FTP address, but usually it’s another web page that will be able to parse the data and formulate some response to the submission.

The method attribute lets the browser know how to encapsulate the data as it is sent to the server. GET methods turn the data into a visible query string, whereas POST data is sent with the URL request separately for processing on the server end. Posted data is the most common.

Example: <form action="process.php" method="post"></form>

<input />

The <input /> tag is one way to collect information from the person submitting the form. The nature of the input field is dictated by its type attribute, which can be filled with values like button, checkbox, file, hidden, image, radio, and reset. For the purposes of this book, we are interested in three specific types: text, password, and submit.

Each text input is displayed as a one-line box into which text can be typed. You can set the value attribute of this kind of field to pre-populate the form field with some text (such as the previously saved value). There is also a size attribute that gives you some control over how long the box should be.

Example: <input type="text" name="user" value="visible" />

The password input is nearly identical to the text input. The big difference is that the text being entered is masked with bullets or asterisks. That offers the person doing the typing protection from onlookers. However, the text is still plainly visible in the source and is not inherently secure upon form submission. If possible, use secure protocols and avoid returning the password back to the form.

Example: <input type="password" name="pw" value="hidden" />

Warning

Text fields are open data entry. That means if it can be typed on the keyboard, it can be sent to the server. This puts some burden on the developer to filter malicious or error-producing content from being fully processed.

This kind of validation can also be done on the client side with JavaScript, but on the server side it must be done with a scripting language such as PHP. Without server-side validation, malicious users (or those that keep scripts disabled for their own browsing safety) can very easily bypass your filters.

The other type of interest is submit. This turns the input into a button that can be pressed to send the data to the server. In this case, the value attribute becomes the label of the button. By default, the browser will use “Send” as the button text.

Example: <input type="submit" name="send" value="Go!" />

Each input tag must also have a name attribute. This is what differentiates this particular blob of data from all the rest of the data you may ask for in your web form. If two elements have the same name, the data will look like a list on the server side and may not be interpreted correctly. As with other tags, the name also serves as a hook for JavaScript and other dynamic languages.

<textarea>

Sometimes one line of text isn’t enough. Imagine trying to blog (or even microblog) with only space for one short line at a time. The brains behind XHTML thought of this when they developed the <textarea> tag. This element does require a closing tag. Any text between the two tags appears in a big box in the web form.

Note

If you use styles to change the font family, style, or other formatting, the dimensions will change as the size of the characters changes: 3 rows by 70 columns will look smaller with 9pt type than it will with 12pt. It is more effective to use CSS (discussed in the next section) to determine the size of form objects:

Textarea {width:400px; height:80px;}

Example: <textarea rows="3" cols="70">My novel</textarea>

<fieldset>

An XHTML element that can be useful for making sense of web forms is the <fieldset> tag. The <fieldset></fieldset> pair creates a visual grouping of any form elements contained within to make it clear to the visitor that they are meant to be together. You can then add a special nested tag—<legend>—that will wrap around the text you want to use for the title of this section. Fieldsets are useful with CSS styles to create tabs, hiding all but the group of form elements that is currently in use.

Example: <fieldset><legend>Login:</legend><fieldset>

Forms are used in the sample applications to authenticate Twitter accounts and store some basic configuration information. Example 3-2 shows all the form parts put together.

Example 3-2. XHTML for a simple web form

<form action="" method="post">
    <fieldset>
        <legend>
            Authenticate:
        </legend>
        <div id="username">
            Username:<br />
            <input type="text" name="twitter_username" size="25"
                value="you can see this" />
        </div>
        <div id="password">
            Password:<br />
            <input type="password" name="twitter_password" size="25"
                value="you can't see this" />
        </div>
    </fieldset>
    <input type="submit" name="submit_button" value=" Go! " />
</form>

A Nod to Some Other XML Structures

Web pages and XHTML are just one form of strict XML structures. The blogs you read typically have RSS feeds, which are formatted using structured tags. The content you will receive from the Twitter API can also be XML.

In this book, we will both read and create new RSS and XML structures. Example 3-3 shows the initial shell for an RSS feed, waiting for dynamic content to be added.

Example 3-3. General RSS structure

<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
    <channel>
        <title>Sample RSS Feed</title>
        <link>http://www.blogschmog.net/feed/</link>
        <description>This is my blog.</description>
        <language>en</language>
        <pubDate>Wed, 10 Dec 2008 13:18:56 +0000</pubDate>
    </channel>
</rss>

CSS

If markup languages are all about structure, stylesheet languages are all about presentation. Cascading Style Sheets (CSS) are used most commonly for web page design, but CSS also applies to other XML structures, such as Scalable Vector Graphics (SVG). Once upon a time, rendering instructions were intermixed with the data structure, causing lots of problems with making things look the same when viewed through different system platforms or on different hardware. With separate stylesheets, however, the same structure can be presented in a way that best fits the context of use.

CSS is used to change the appearance and layout of the structured data, spanning everything from colors to fonts to arrangement of content. It can even hide or reveal content, making the web page interactive when associated with event handlers. This is particularly powerful when teamed with a dynamic scripting language like PHP, as you can create a web page that detects a person’s browser signature or IP address and then select a stylesheet on the fly that is optimized for display in that context.

Stylesheets are built from three basic parts. The selector references the tag element by name, ID, or class. The property is the part that indicates what is to be changed, and the value indicates how it should be presented:

selector { property:value; }

Note

Some related properties and values can be combined to reduce the number of lines of definition the stylesheet requires to display content in the way you want. For instance, borders—the lines surrounding a particular element—have three properties that can be set either separately (border-width:1px; border-style:solid; border-color:black;) or as one combination property (img { border: 1px solid black;).

The browser will prioritize the style changes and render the display. Sometimes, this may produce unanticipated results—for instance, text in a paragraph that you want to appear in red may show up as blue, forcing you to investigate what is causing (or sometimes preventing) the change. Referencing a selector with a different property value later in the rendering process, or simply not knowing which properties are inherited for which embedded tags, can lead to a lot of hand-wringing.

In Example 3-4, the first line of visible text (“body”) is rendered in blue while the next line (“This text is black.”) is black.

Example 3-4. Inline style tag

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
    <title>Test of CSS inheritance</title>

<style type="text/css">
*{color:black;}
body {color:blue;}
.my_class {color:gray;}
#my_div {color:red;}
#my_div a {color:green;}
#my_div h4, #my_div p {color:purple;}
</style>

</head>
<body>body
<p>This text is black.</p>
<div id="my_div">
    <p>This text is purple.</p>
    <h4>header 4</h4>
    my_div
    <a href="#">link</a>
</div>
<p class="my_class">This text is gray.</p>
</body>
</html>

Warning

Be careful when using the * CSS selector—it’s easy to accidentally override another style without realizing it. Using a “reset” stylesheet typically addresses this nicely by setting a baseline style for the document body and then telling all child elements to inherit the styles that you don’t explicitly specify for them.

The browser’s prioritization is based on a weighting formula that calculates how important a particular rule is: more specific style definitions are given a greater weight. A selector of #my_div .my_class p, for instance, would be deemed more important than simply p or .my_class, regardless of the order in which the browser encounters those selectors.

The World Wide Web Consortium officially recognizes CSS1 as the recommended standard. However, there are two newer versions—CSS2 (currently in candidate status) and CSS3 (now in development)—that may one supersede it. These later iterations add support for fine control of positioning, media types, and effects such as shadowing.

I promise that the CSS for the sample applications in this book won’t get too fancy.

Assigning Styles to Structure

Example 3-4 shows several ways in which selectors can be constructed to refer to HTML elements and to dictate how the browser displays content. The syntax relies on a handful of characters:

Hash (#)

The hash references a specific ID attribute associated with a tag element in the XHTML structure. Each ID selector can be used only once per page.

Example: #whoopi { color:purple; }

Full stop (.)

The full stop, or period, references a specific class attribute associated with one or more tag elements in the XHTML structure. Classes can be used as many times as you want and are great for formatting repeating content, such as dynamic lists.

Example: .kermit { color:green; }

They can also be used in conjunction with a specific kind of tag to set a style for that combination of element and class. Because this is more specific, it will take precedence over the class alone.

Example: h4.kermit { color:black; }

Asterisk (*)

The asterisk is a wildcard character that can be used to set the default style for any unassigned tag elements in the page. This is often used as the first style declaration, to set the font size and family for the entire page.

Example: * { font-family: sans-serif; font-size: .9em; }

You can also use the wildcard to apply styles to everything contained in a specified selector. However, this can lead to frustration if you accidentally overwrite a specific style declared elsewhere in the sheet.

Example: #sidebar * { color: blue; }

Space ()

Spaces are used to build a complex string of selectors into a nested style. A nested style is inherently weighted with greater priority than a single-element selector because it is more specific. You can, for example, define all anchor tags to be underlined (that’s the default interpretation by the browser, actually) but turn off the underlining for links in a specific division.

Example: #subtle a { text-decoration:none; }

Comma (,)

Commas are used to group several selectors together and assign them the same style properties at once. This is useful when a number of selectors share the same subset of styles. Any differences can be declared separately for each selector.

Example: #one, #two, #three { color:red; }

Nested styles are both a blessing and a curse. Such fine control is great, but the more complicated the CSS becomes, the more hair you’ll pull from your scalp trying to figure out why a link buried deep in the web page won’t turn blue. Maintaining good stylesheets for a large website is an art. Reuse and revise your stylesheets as you build your applications to limit conflicts.

Laying Out Your Web Page Content

In the early days of web design, the people designing web pages often had experience building printed pages, where the position of each bit of text or picture can be finely controlled. The Web, however, is a flexible medium. The same page can be displayed on big monitors and small, with a Mac or a PC, using Internet Explorer or Netscape Navigator. Flexibility and precision didn’t always jive.

Although later versions of CSS promise that fine control, even CSS1 provides several properties that let you dictate where chunks of content are presented on the page. The following are the properties used in the sample applications described in this book:

padding and margin

The “box model” best describes margins, padding, and borders. If you picture the content of an element as the innermost box and the element border as the outermost box, the padding is the space between the content and the inside edge of the border. The margin is the space between the border and the nearest other elements on the page. This extra space is like a force field that protects other content from crowding the parts of the element.

Values for the padding and margin properties can take many forms, including percentages and specific lengths. Lengths are typically measured in terms of pixels (px), which are absolute, or em-length (em), which is a relative measure based on the width of the letter “m” in the chosen font. Margins can also have a special value, auto, which will attempt to balance the spacing between elements. This is often used to center a block of content on a web page.

Example: #content { margin:20px; padding: 1.5em; }

float

Floating an element moves it to the right or the left of the other elements within the parent element, as content flows around it. Floats are used to create sidebars or to describe the relationship between an image and its surrounding text.

Example: #sidebar { float:right; }

width

This property sets the horizontal dimension of an element. The width is measured without consideration of the margin, border, or padding values (or as if all three values were set to zero). Remember that there is both a left and a right side to any element, so a division with a width of 100, padding of 10, border of 1, and margin of 20 would take up 162 pixels across the page (100 + 10 + 10 + 1 + 1 + 20 + 20 = 162).

Example: #column { width:350px; }

text-align

This property sets the alignment of text within a block element. The usual suspects are possible values (left, right, center, justify). text-align cannot be used to affect the placement of a division; instead, use the float property to move divisions to one side of the page or the other.

Example: h2 { text-align:center; }

display

We sometimes take for granted, based on experience, that certain tags will be rendered in specific ways. Header tags result in larger, boldfaced text, with some vertical spacing above and below. List items are indented and have space between each new item. All of that presentation, though, is part of an accepted interpretation of style properties that the browser adds to the page upon seeing those tags.

With the display property, you can assign some of those properties to any tag element. Division tags can become inline objects and anchor tags can become block elements. Either can be displayed as list items. This kind of reappropriation of inherent properties is not frequently done, but display does have one very useful value: none. This effectively hides the element and all of its contents from display. With the incorporation of JavaScript, this value can be changed on the fly, making pop-up content and contextual menus possible.

display differs from another property, visibility, in that with display the hidden elements are not factored into the positioning of the visible elements on the page.

Example: #help { display:none; }

Decorating the Web Page Content

The other big use of CSS is to make the page look pretty. Most presentational control of this fashion revolves around fonts and colors. The sample applications in this book use some of these basic controls to make the web pages look clean:

font-family

In word processors, changing the shape of the letters to match a typographical set of characters is as easy as selecting from a menu. Your computer scans your operating system for the presence of fonts and lists all the ones it finds as options. A web browser has access to this same list of fonts, but the basic list differs from one computer platform to the next and can be expanded by installing extra fonts. To make a web browser try to use a specific font, the CSS must be coded to pass that request on to the client.

The font-family property allows the web developer to send a comma-separated list of font names to the browser, which will use the first one that it recognizes as available in its system. For this reason, any exotic fonts should be either avoided or listed with other, more common fonts that are likely to be available. The list of values should end with a general family type—such as serif, sans-serif, script, or monospace—to direct the browser, as a last resort, to pick the system- or user-defined preferred font for that general category.

Example: * { font-family:Helvetica,Arial,sans-serif; }

font-size

The size of the text is dictated by the font-size property, which can accept values that range from percentages to length units to keywords (smaller, larger, xx-large, etc.) that are interpreted by the browser. The values are most commonly expressed in terms of px (pixels), pt (points, the default), or em. Em-length is often preferred, since it is calculated from the current size of the text. A font-size value of .9em would cause the text to be rendered at 90% of the size it would be without the new style definition, and 2em would cause the text to be rendered twice as large.

Example: p { font-size:.9em; }

color, background-color

Colors can be assigned to any object to describe both its text (or border) and the background fill upon which that text rests. There are a number of predefined color names: aqua, black, blue, fuchsia, gray, green, lime, maroon, navy, olive, orange, purple, red, silver, teal, white, and yellow. The color can also be expressed as a hexadecimal value (#00ffcc) or its decimal equivalent (0,255,204) to indicate how much red, green, and blue to mix to form the desired color.

The background-color property also accepts a value of transparent, which means whatever colors, images, or text are underneath the layered object will be visible. This is the default, which is why setting the background color of the body will result in all elements having a background of that color.

Example: #sidebar { color:green; background-color:black; }

text-decoration

This property deals with some of the less common style changes for text, such as strikethroughs, overlines, and underlines. It’s useful if you want to remove an underline, such as the one added to a hyperlink by default. text-decoration can also cause things to blink, which is probably a sign that we aren’t learning from past mistakes.

Example: a { text-decoration:none; }

Getting the Browser to Recognize Styles

Once you’ve defined your styles, you have to get the browser to locate your styling genius and use it to render your web pages. CSS can be used inline as part of the XHTML or referenced as a separate file outside of the web page.

For inline references, the style attribute can be added to any tag and filled with style properties and values, as in:

<div style="font-size:36pt;"></div>

Whatever styles are defined in this attribute will apply to the content of the tag itself. Since embedded objects can inherit some or all of these properties, the style may cascade down to structures contained within the styled element. Changing the font size of the body, for instance, will also change the size of the text in all of its <div> and <block> tags, unless that setting is overruled by another style.

Another way to bring styles into a web page is to define a whole block of properties at once using the <style> tag. This tag is typically used between the <head> tags that are used to define the page. By placing several properties between the style tags, you can define the look of an entire page in one easy-to-edit location in the code.

A third way to assign styles to the elements of a web page is to bring in the defined properties as an external file. There are several advantages to doing so, not the least of which is that it makes the amount of code in your web page much smaller. External files can also be shared across all of your sites, giving you one place to edit a style whose properties you want to affect your entire site.

External files can be brought to the browser in one of two ways. The first is the <link> tag, which is placed between the <head> tags of the HTML. The browser will look for the file referenced in the href attribute, as in:

<link rel="stylesheet" type="text/css" href="my_styles.css">

The second method is to leverage a CSS convention, @import, between <style> tags:

<style type="text/css" media="all">@import url(my_styles.css);</style>

The big advantage of this method is that older browsers can’t and won’t attempt to interpret this code; they will simply render the page based on the HTML they can see. This increases the chances of compatibility with older systems.

When the browser looks for the specified file, it will start in the directory containing the page it is trying to render. You can use relative paths (../css/my_styles.css) that climb up and down the directory structure, server paths (/css/my_styles.css) that start at the domain root, or full URLs to point to the file.

PHP

I can remember the early days of web design, when sites consisted of static content, hardcoded into a certain state by the developer/designer/editor/marketer. Small sites with three to five pages were tough enough to maintain. Once a website grew beyond about two dozen pages, time spent making frequent changes could eat up a hefty percentage of the workday.

Dynamic web pages used to be created on the fly using runtime programming languages such as Perl, primarily. There are more options today, including proprietary platforms such as ASP and ColdFusion. PHP—which stood for “Personal Home Page” when it was first released in 1995—is a widely used free server option that powers over 20 million websites, including popular tools such as Wikipedia, WordPress, and even Facebook.

PHP bears a lot of resemblance to Perl, largely because that was what PHP’s creator, Rasmus Lerdorf, meant it to replace. It has had five major revisions; the most recent (PHP 5) is now a half-decade old. Although PHP was intended for web page generation, it can also be used from the command line, making it amenable to automation with crontabs.

The language is much richer than what you will encounter in this book. My intention here is to explain only the part of PHP that will be used in the sample applications, which are intended to be bare-bones examples of some of the things you can do with the Twitter API.

How to Accept Candy from Strangers

Here you are, ready to set the world on fire with your creativity and build a useful tool for the Twitter community. Your application will require not only data from an API over which you have no control, but also user input typed into web forms and content feeds originating from remote sources. Before accepting such input, you should know about some of the risks in doing so.

SQL injection attacks

Database servers are separate beasts from the rest of the web application. They have their own security and usually reside on a completely different machine. The data they currently contain and are willing to store is of little use if they can’t communicate with the web pages that are being rendered, but the necessary connection between the web application and the SQL database creates an opportunity for malicious users.

A SQL injection attack can happen if the data you pass to SQL statements has not been properly filtered to remove escape characters. If these special characters get passed to the database, they could be interpreted as a query. The injected code may be a hexadecimal string that looks like gibberish to you, but the SQL server may still be able to decode and interpret those characters. This kind of attack isn’t specific to PHP or MySQL; it can occur whenever one scripting language is embedded within another.

CRLF injection attacks

CRLF refers to the carriage return and line feed combination of ASCII characters often used in Windows applications to indicate a new line (for Unix-based systems, the end of line is denoted with a line feed only). The attack is characterized by a hacker injecting CRLF commands into the system. Since these characters are often used to parse data, extra end of lines can allow one entry to become several.

Cross-site request forgery (CSRF) attacks

This kind of attack is also known as a “one-click” attack. It is made possible by exploiting ill-conceived browser cookie strategies to send commands to another site that relies only on the cookie for authentication. What makes CSRF so scary is that the links that send the commands can be included as the source of an image file. The act of loading a web page in a browser or viewing an email message in an HTML-enabled email client can be enough to kick-start the attack.

There are a few prerequisites for CSRF to work. First, the attacker has to target a site that is known not to check the Referer header. Second, that site must contain a web form that does something useful for the hacker (like transferring funds). The attacker must then simulate that form perfectly, with all the required form names and value constraints, and get the target to load up the malicious web page where the attack is embedded while that person is still logged into the remote site.

Note

The best way to prevent all these kinds of attacks is to treat all remote data (whether it comes from web form submissions, RSS feeds, or even the Twitter API) as hostile. Escape any variables you don’t know are filled with good data.

Strings, Arrays, and Objects

The most important concept to grasp is the cornerstone of programming, the variable. This is the entity into which you store data so you can manipulate it and create things such as search engines and blogs showing dogs sleeping upside down. All of this important stuff is dependent on our ability to name and retrieve information as part of the programming logic.

In PHP, strings, arrays, and objects are all referenced in the same way: with a $ and the name of the variable. PHP keeps track of what kind of variable it is based on the context of what you are trying to do with it. A string contains a discrete value, which could be an integer or a bunch of text. An array is a list of strings, referenced with a key that can either be an integer or text. You will use objects whenever you deal with files, databases, or XML structures.

When setting a string, the content is encapsulated in quotes. Single quotes (') cause the content to be taken literally, without interpretation. This means if you include another variable, the value of that variable is not substituted. Double quotes (") cause PHP to attempt to interpret any variables contained in the string and resolve them to the values they represent. Because of this disparity, single-quoted strings will run faster. If you wish to include a quotation mark of the same type that is encapsulating the string, you have to escape it by adding a backslash () before it. This tells PHP to ignore it:

$truth = "<div id="sarcasm">The Packers play 'football'</div>";

Sometimes short lines of text are not enough. You may want to store a lot of text all at once and not have to deal with all the concatenation and character escaping. Enter the heredoc. With this string-creation tool, you signify the start and the end of the text you want to save and just type away between those points. The start point is in the form <<<EOS, where EOS refers to the sigil text PHP should look for as a signal to stop storing content in the variable. That end text cannot be indented. For example:

$truthier = <<<GO_BEARS
<div id="fan">
    <div id="diehard">
        <h4>Chicago Bears</h4>
        <p>$this_person is a diehard fan of #{$player['jersey']}.</p>
    </div>
</div>
GO_BEARS;

With a heredoc, you can freely insert quotes and variables without the need to escape them. The only special thing you have to consider is when referencing arrays: array values must be wrapped in curly braces ({}) to let PHP know to evaluate them and use their stored values instead.

Arrays are only a little more complicated. You let PHP know that a variable is meant to be an array by using the array() function. If you do not include data in the parentheses, the variable will simply be set or reset as an empty array. Associative arrays can be created using the convention of key => value pairs listed with commas as delimiters:

$cURL_options = array(
    CURLOPT_USERAGENT      => 'Twitter Up and Running - '.$app_title,
    CURLOPT_USERPWD        => "$twitter_username:$twitter_password",
    CURLOPT_RETURNTRANSFER => 1
    );

Each value is then referenced as $array[$index], where $index can be an integer key or, for associative arrays, a string, as in:

$UserAgent = $cURL_options['CURLOPT_USERAGENT'];

You can add a new indexed value to an array by leaving the index blank. This technique is used within loops as an easy way to build the array iteratively:

foreach ($cURL_options as $key => $value) {
    $option_keys[] = $key;
    $option_values[] = $value;
}

Arrays can be nested. In this case, instead of storing a string for a given key, an entire array is stored. If $twitter_data were an array that contained the individual arrays containing each Twitter member’s most recent status update, a single tweet could be extracted as:

$my_tweet = $twitter_data[0]['tweet'];

PHP objects will come into play whenever you parse data from the Twitter API. The XML files that are returned are turned into objects that can be manipulated by referencing shared methods within them, as in:

$feed_title = $doc->getElementsByTagName('title')->item(0)->nodeValue;

In this case, the value being returned is whatever text was wrapped within <title></title> tags in the XML structure. The text is extracted from the object and placed into the string $feed_title for later use.

Manipulating the Data You Collect

Once you’ve captured some information in one of the variable forms, you may need to do something to it to make it useful. Like any good programming language, PHP has a number of functions that perform commonly required manipulations:^[61]

array_search(array_to_search,array_of_search_terms)

Rather than building a loop to check every value in your array, you can use the array_search() function to look for matches with a search string or multiple values stored in an array. The function returns the key for the first match it finds.

base64_encode(string_to_encode) base64_decode(string_to_decode)

This function encodes data as base64, returning a new string that is a bit (33%) larger than the original. It is used to make data safe for transfer to destinations where some of the characters may cause problems. The base64_decode() function will convert text that has previously been encoded with base64_encode() back into its original string.

basename(path_to_file)

This function takes a string containing the path to a file and returns the base name of the file. For instance, if the path is /path/to/file.php, basename() will return file.php.

bin2hex(ascii_text)

This function takes an ASCII string and returns the hexadecimal equivalent.

ceil(value_to_round_up)

The math function ceil() turns a value into the next highest integer, or the ceiling for the number.

date(format_template,timestamp_to_format)

The date() function turns a numeric version of the time into a formatted string. The template for that resulting string is dictated by the letter codes used as placeholders for the various parts of the date and time. For example, “l, F jS, Y G:i:s a” would produce something like, “Thursday, December 25^th, 2008 6:21:34 am.”

dechex(hexadecimal_value)

The dechex() function converts a hexadecimal string into a number. The input value can be no more than eight characters long. This is the opposite of hexdec(), which performs the conversion the other way.

explode(delimiter,string_to_break_into_na_array)

The explode() function converts a long delimited string into an array of strings. You specify the character(s) to use as a delimiter, and it splits the provided string into parts at every instance of that delimiter. The parts then become separate indexed values of a new array. The companion to this function is implode().

gmdate(format_template,timestamp_to_format)

This function works in the same way as date(), turning a numeric timestamp into a formatted string to represent the date and time. The difference is that the time is expressed in Greenwich Mean Time (GMT) rather than in the local server time zone.

hash(method_of_encryption,string_to_encrypt)

Hashing involves use of a specific algorithm, such as sha256, to encrypt a string. This is a one-way process used to store sensitive information, such as passwords, for comparison with user-entered information as a way of verifying access. There is no mathematical way to reverse-engineer this, although lookup lists can be generated to re-map back to the original string. It isn’t easy, especially if you salt the password by adding a random string to the password being stored.

Example: $secured = $salt.hash('sha256',$salt.$password);

htmlentities(string_to_encode)

This function recognizes that some characters—namely, the ampersand (&), double quote ("), single quote ('), less-than symbol (<), and greater-than symbol (>)—have special meaning in HTML and may cause text containing them to be rendered inappropriately. The htmlentities() function is often used to prevent text submitted by the user from rendering as HTML by substituting translations for these characters. It is similar to htmlspecialchars() but substitutes more comprehensively.

You can reverse the translation by using html_entity_decode().

implode(delimiter,array_to_make_a_string)

This is an important function that bridges the gap between strings and arrays. You can use implode() to turn an array into a single string by specifying a delimiter character(s) that will connect each value in the array. This is the reverse of the process performed by the explode() function.

in_array(term_to_find,array_in_which_to_search)

Like the array_search() function, this function is a way to search through the values stored in an array for keyword matches. The in_array() function simply confirms with a Boolean whether or not a match is found.

is_int(value_to_test)

This function tests to see whether a given variable value is an integer, returning a Boolean as a result.

mt_rand()

The mt_rand() function is one of many random number generators that aren’t truly random, but that generate pseudorandom values that work well enough to make programs seem unpredictable. This one is a bit faster than the rand() function, which is why it is preferred.

number_format(number_to_format,decimal_places)

This versatile function accepts up to four parameters to return a string that adds formatting to a numeric value.

pack(format_template,arguments_to_convert ...)

The pack() function uses formatting codes inherited from Perl to turn a list of arguments into a binary string. The binary string can be turned back into an array of arguments with unpack().

preg_replace(pattern_to_find,replacement_value,string_to_search)

This is a search-and-replace function that uses pattern matching to locate the places to insert replacement text into a string.

Example: $x = preg_replace('/&(?!w+;)/', '&', $x);

reset(array)

When arrays are navigated, the internal pointer is incremented to reference items with higher and higher indices. The reset() function returns that pointer to the first element in the array.

sizeof(array)

This function counts the number of items in an array. count() is an alias for sizeof().

stripslashes(quoted_string)

For most PHP configurations, quotation marks within the user-submitted text are automatically escaped to allow for operations such as saving to a database, where they could mess things up. There are times, though, when that is overkill. The stripslashes() function will remove those escaped characters.

str_pad(string_to_pad,minimum_length,pad_text,instructions)

Back in the day when all formatting was done with monospaced characters, lining up numbers and text in columns required some padding spaces or other characters. The str_pad() function makes that easy by asking for a minimum length for the resulting string and the text you want to use to make a shorter string that target size. By default, the padded characters will be added to the right of the initial string, but you can specify instructions to pad to the left side (for numbers, typically) or on both sides (to center the string).

The sample applications make use of str_pad() in password encryption to make sure that the salt strings are all the same size.

str_replace(text_to_find,replacement_value,string_to_search)

This function works like preg_replace() except it is more straightforward, ditching regular expression matching with a simple string to match. It returns an array with all of the occurrences where the value was replaced.

str_rot13(string_to_shift)

This encoding function simply shifts every letter it finds in a string by 13 places in the alphabet. Any numbers or other nonalphabetic characters are ignored. Decoding is accomplished simply by running the result back through the same str_rot13() function.

strlen(string)

Just as sizeof() returns the number of items in an array, strlen() returns the number of characters in a string.

strnatcmp(first_string,second_string)

Whereas strcmp() treats numbers as text, the strnatcmp() function is able to order strings based on “natural ordering,” or the way a human would read and interpret a string. That is, instead of ordering a list “1,10,2,20...”, this function would correctly interpret the order as “1,2,10,20...”. It returns an integer value between −1 and 1 to indicate whether or not the first string should be ordered before the second. It is more often used with usort() to help order entire arrays of data.

strpos(string_to_search,text_to_find)

Like array_search() does for arrays, the strpos() function returns the position of the first match of the given text found within the specified string. Since it returns the Boolean false if the text is not found in the string, this function is also useful as a conditional to confirm whether a match exists.

strrev(string_to_reverse)

This function simply reverses the character order of a given string. The result can be decoded by running it through the same strrev() function again.

strtolower(string_to_convert)

When case is not important (as is typical of usernames), it is good practice to turn strings into lowercase for storage and comparison. The strtolower() function changes any capital letters into lowercase.

strtotime(datetime_as_text)

There are many different ways of conveying a date and time. Many of these are variations of the numbers and text we commonly use to report time in a readable format, such as “January 30 2000 11:13:00 GMT.” For PHP to be able to do any calculation on that moment in history, the text has to be converted to its numeric counterpart. This is performed with the function strtotime(). If no string is provided, the current Unix timestamp will be returned.

substr(string_to_shorten,starting_position,length_of_substring)

To extract a part of a larger string, use the substr() function to specify the starting point within that string and optionally the length of the new string. If the starting position is negative, the extraction will be done from the end of the original string, rather than the beginning.

trim(string_to_trim)

Sometimes data is entered or recalled with leading or trailing whitespace. This can include ordinary spaces as well as tabs, returns, and newlines. The trim() function will examine both ends of a string and clip off any whitespace it finds there.

urlencode(string_to_pass_as_URL) urldecode(string_passed_in_URL)

The urlencode() function adjusts a string of text to turn any special characters to a URL (such as an ampersand) into characters safe for transfer in a URL query string. Some are converted to a different character (for example, a space is converted to a plus sign, +), whereas most become hexadecimal codes tagged with a % symbol, as in %7E.

This adjustment should be made before any text is passed to a form as a URL query string. The urldecode() function will return a string that restores all %## translations back to their original characters.

usort(array_to_sort,name_of_sorting_function)

Arrays can be sorted in very complex ways. The usort() function allows you to reference another function for comparing two values and use it to order the contents of a given array. The function is typically one of custom design and must provide the same kind of response as an existing comparison function, such as strnatcmp(). That is, the comparison between two values should return an integer between −1 and 1 to indicate which of the two values should precede the other.

utf8_encode(string_to_encode)

The utf8_encode() function encodes a string to UTF-8, a Unicode standard that deals with wide character values.

This is not a complete list of all that PHP can do, of course, but it does represent all you need to know to follow along with the programming done in this book.

Note

For very thorough documentation on PHP’s hundreds of functions, visit http://www.php.net.

Knowing Your Environment

There are other ways for users to communicate with your PHP program besides explicitly filling out a web form. Every time a visitor clicks on a link on your website, she makes a formal request to the server for content. That action automatically creates a slew of special predefined variables that are available to scripts to let them know something about the context of the request.

One of the more useful of these variables is $_SERVER, an associative array that contains a variety of information about the paths and script locations. The $_SERVER values are filled in by the web server, so if the script is not requested over the Web, this array will be empty (as is the case with automated tasks configured as cron jobs).

Warning

The server does fill in these values, but many of them are based on user input (the HTTP request). As such, even some environmental variables can be used for an attack if not properly sanitized.

Even if the script is requested over the Web, not every variable will be filled with information. For example, HTTP_REFERER is supposed to contain the address of the page that the user was on when he clicked a link to your web page. However, if that person loaded your web page from a bookmark, or if the ISP host through which he accesses the Internet doesn’t provide that address information, then the value for $_SERVER['HTTP_REFERER'] will be empty.

Although more information about the server request is made available by PHP, these are the variables that are used in the sample applications for this book:

__FILE__: Whereas a $_SERVER variable will reflect the file that was requested via the Web, __FILE__ always refers to the full path and filename of the script in which it is invoked. That means an included file will have a different __FILE__ value than the main script that called it.
Note
__FILE__ is referenced using two underscore characters on each side of “FILE.”
$_SERVER['DOCUMENT_ROOT']: The document root is the directory path on the server from the top of the configured file structure to the directory root under which the current script is running.
$_SERVER['HTTP_HOST']: The host usually contains the domain information part of the URI request. SERVER_NAME likely holds the same value.
$_SERVER['QUERY_STRING']: If there is a query string—the part of a URI that comes after the filename, following a question mark (?)—that entire part of the request string will be stored in this variable. This information is likely also pre-parsed into name and value pairs in the $_GET associative array.
$_SERVER['REQUEST_METHOD']: Several kinds of HTTP request methods are possible, but most web pages deal with just two: GET and POST. This server variable stores the name of the method used to access the script and can be useful in determining where to find any user-provided input parameters.
$_SERVER['REQUEST_URI']: This variable contains the URI that was used to trigger the web page script, once the request arrived on the server. This does not include the protocol and domain, only the server path after the web root (as in /index.php). The REQUEST_URI will also include the query string, if one exists.
$_SERVER['SCRIPT_NAME']: The SCRIPT_NAME is the current script’s path. This is often the same value contained in REQUEST_URI, but without a query string, if one exists. Unlike __FILE__, this variable will reflect the path to the requested file, not the one invoking the variable.

Web forms and links are usually submitted to the server using either the GET or POST HTTP method. These methods encapsulate the data in different ways, and PHP then parses it to fill special associative arrays, $_GET and $_POST. When processing a web form or accepting query string parameters as part of a request, your program can reference these arrays to check for the presence of user input.

Note

If you accept the same form variables using either method, you will have to determine which method takes precedence. For instance, if posted data is preferred, that means your program only has to check $_GET['variable_name'] if $_POST['variable_name'] is missing.

Controlling the Flow of Logic

Programming is about more than storage and calculation. It is also about the flow of logic. PHP offers the usual assortment of flow controllers to help your application make decisions about what to do.

if…elseif…else

The if statement acts like a decision tree, checking to see whether a given expression is true before executing a particular part of the program. The expression can be very complicated or involve multiple comparisons, but in the end it is either a true statement or a false one. If the expression is false, the statements between the curly braces are ignored.

The if statement can be extended using elseif and else. The former performs another check on a different expression, proceeding with that section of code if the expression is true and moving on without processing if it’s false. Multiple elseif statements can be strung together in sequence, but each will be evaluated only if all of the expressions that preceded it proved false. An else statement is the last one in the flow structure and serves as a catchall. It does not evaluate an expression; if all of the previous expressions fail, the statements within the else brackets are executed by default.

if ($a == 4) { $x = 1; }
elseif ($b == 0) { $x = 2; }
elseif ($c == $b) { $x = 3; }
else { $x = 0; }

while

A while loop will continue indefinitely, for as long as the expression is true. The moment its state becomes false, the while block will end and execution will continue with the next line of code. The expression is reexamined with each iteration.

while ($row = mysql_fetch_assoc($sql_result)) {
    $rss_feed_stored =  $row['rss_url'];
}

Note

Be careful about the expressions you use in a while loop. If you evaluate a condition that never changes, the loop will never exit.

for

A for loop will execute the statements in its block repeatedly until the result of its expression reaches a maximum value. The initial state, the expression, and the increment for each iteration are all defined in the initial loop.

The first part of the for expression is evaluated once, at the very beginning of the first loop, to set the initial conditions. The second and third parts of the expression are then evaluated at the start and end of each iteration, until the second expression proves false. The three parts are separated by semicolons (;).

for ($i=1; $i <= $max_value; $i += 1) {
    echo $i . ' of ' . $max_value;
}

foreach

The foreach loop is an easy way to iterate over the contents of an array (or, as of PHP 5, objects). At the start of each iteration, the next value in the array is assigned to a variable that can then be referenced in the statements contained in the foreach block. For associative arrays, a special form allows the key and value pair to be assigned as separate variables, so both are readily available during execution.

foreach ($array_of_arrays as $this_array) {
    foreach ($this_array as $key => $value) {
        echo $key . ' is filled with ' . $value;
    }
}

switch…case…break

Because some variables may have many different values that trigger distinct responses—this is true for codes for status messages returned to the user as a response to submitting a form—the if...elseif construct can grow unwieldy. The switch statement simplifies the code needed in that situation by allowing you to use one variable expression with many possible values. Each value, or case, contains its own set of statements to process. There is also a default that is executed if the variable value does not find a match.

Unlike in other loops, the switch block executes each case line by line, regardless of the value; it does not stop and ignore all remaining cases automatically when it finds a value match. After a false value comparison, it simply skips to the next case, ignoring the statement in between. When it finds a match, PHP starts executing the next line of code. This will continue until the end of the switch block unless PHP encounters a break statement to tell it to stop. If you forget to include a break at the end of the statements for a given case, everything in the entire switch block will get evaluated, including the default.

switch ($root) {
    case 'user';
          $thisFeed = $xml->id;
         break;
    case 'status';
          $thisFeed = $xml->user->id;
         break;
    default;
         break;
}

Note

A break can be used elsewhere in your code to stop the execution of a loop. It allows PHP to escape the current for, foreach, and while structures, too. If they are nested, it will escape only the current block.

try…catch

PHP 5 introduced exception handlers that “catch” errors that the server “throws” when it tries to run the part of the code contained in the try block. Every try must have at least one catch block; it can also have more than one to handle each exception differently. If no error is detected, or if there is no catch block configured to catch the error, the rest of the code is executed as normal.

If an exception is caught, any statements in the catch block will be executed. This code may or may not be programmed to terminate the program. The purpose of exception handling is to either exit gracefully or allow the program to continue despite the error.

try {
    # some statements that cause PHP to choke
} catch (Exception $e) { $form_error = 15; }

die() or exit()

There are some cases where errors encountered by PHP should simply stop execution of the script. Both die() and exit() act in the same capacity, killing the running script where it stands and outputting the error message provided as a parameter:

$filehandle = fopen($file,$type) or die("can't open file $file");

The exit() function can also be useful when debugging, allowing you to temporarily insert the statement before some problem code can be executed to allow you to check the state of the program at that moment.

Note

It is better to avoid using either die() or exit() in your final scripts. Instead, implement more user-friendly error handling.

File Management

Files that store and later retrieve data are staples of web applications. They are easier to use than databases, since they don’t require authentication (although they may require file permissions to be granted to the directory in which they reside). Files can also be used to create static versions of a website through automated tasks to cut down on overhead when databases are required.

In our sample applications, file management functions are used to create log files that report on activity by automated tasks. Here are the methods you’ll use:

fopen(file_name,mode_for_opening)

To read from or write to a file, PHP must be able to reference it in some way in the code. The fopen() function creates a file handle that binds to a stream connecting PHP to the contents of the specified file. For this to be successful, PHP must have access to that file, meaning that it not only must be reachable but also must be configured with the appropriate permissions.

Note

Windows servers reference the path to the file differently than Linux servers do: any backslashes () must be escaped, as in c:\my_documents\my_file.txt. Alternatively, you can use forward slashes and avoid the problem.

There are a few different ways PHP can open the file. fopen() asks you to declare the mode (a code that dictates whether you can read from and/or write to the file), where the pointer should be located when it opens the file, and what to do if the file already exists. For logging—where you only need to add a line of text to whatever is already in the file, and worry about reading the contents at a later time—the mode "a", for append, is sufficient.

fwrite(file_handle,text_to_write)

Once a file is open in a mode that permits writing, the fwrite() function can be used to enter a string of text into that file. The file is referenced through a file handle, not by the path and name of the file itself. Prior use of fopen() is required to create that association.

Note

Be aware that different operating systems have different conventions to determine line endings. Unless you are OK with your text running together in one long, wrapped line, using the correct convention is important. For Linux, the newline () is sufficient. Macintosh looks for a carriage return (), and Windows machines require both ().

fclose(file_handle)

Assuming the file handle pointing to an existing file is valid, fclose() terminates the association between PHP and the file stream.

unlink(file_name)

This function deletes the file specified in the parameter, provided it exists and permissions on the server allow it.

file_get_contents(uri_to_retrieve)

If you want to create a virtual browser and retrieve content available on the Web, the file_get_contents() function can help. Enter an encoded URI, and a string will be returned containing the contents of that file, as rendered by the web server. If the retrieval fails, the function will return a Boolean false.

This function can also point to a local file, acting as three functions in one by opening, reading, and closing the file connection. file_get_contents() is used in the sample applications in this book to create a TinyURL that can be included in a direct message to a Twitter member.

Warning

file_get_contents() won’t work if the configuration setting allow_url_fopen is disabled.

Connecting to the Database

Teaming the dynamic programming of PHP with the power of SQL query statements can make for some potent applications. To make this hookup, however, you need to make use of a special group of functions to access and interact with a MySQL database.

Note

With PHP 5, an improved version of this group of old MySQL functions was added. The mysqli extension (the “i” is for “improved”) has a procedural interface and an object-oriented interface. Using this extension has several speed and security benefits, and it is recommended that you upgrade your code to take advantage of it.

For more information on converting from mysql to mysqli, see http://forge.mysql.com/wiki/Converting_to_MySQLi, as well as the main documentation at http://us3.php.net/mysqli. There are some configuration changes that may be needed for PHP to be able to use the new functions.

mysql_connect(database_host,username,password)

This function initiates the connection between PHP and the MySQL database server by specifying the server location with access information. The function mysql_connect() returns a link to the database upon success.

mysql_select_db(database_name)

A database server can host multiple databases. For your queries to retrieve the data you want, you have to specify one of the available databases on the server. The database name passed to this function will become the current active database associated with the open database link. All communication will be with that database.

The link to the database connection can be specified, but if it isn’t, this function will use the link last opened by mysql_connect(). If no connection has been made previously, it will try to make a new connection without any parameters.

mysql_query(sql_query_statement)

The mysql_query() function passes a single SQL statement to the current active database through a valid link to the database server. For queries that are meant to fetch data, the function returns a result set. For other queries that are intended to perform some action (like an INSERT or DELETE statement), a Boolean is returned to indicate success or failure.

When you are using the command line or files to run SQL statements, a semicolon is required to let MySQL know when to stop reading the query. This function deals with that internally, and therefore SQL statements passed as parameters should not include semicolons.

mysql_affected_rows()

For queries where no records are expected, such as DELETE statements, this function returns the number of rows that were affected. If the query fails, a negative value is returned.

mysql_num_rows()

For queries where records are expected, this function returns the number of records in the result set.

mysql_fetch_array(sql_result_set)

Generally, you’ll connect to a database to get data from it. The result set stored in the numerical array returned by mysql_query() can be parsed using mysql_fetch_array(), looping each row of data in a foreach block. This function automatically moves the array pointer to the next index when called, returning a Boolean false when there is no more data.

mysql_real_escape_string(string_to_escape)

As with URIs, there are characters that have special meaning for MySQL and thus should not be included in the queries you submit. Failing to screen for these characters may lead to trouble, ranging from merely causing the query to return an error all the way to allowing malicious activities with the database. The mysql_real_escape_string() function examines MySQL’s own library of special characters and replaces them with safer versions. Because of the potential for disaster and the ease of its use, there is no reason not to escape all text strings before sending them to the database.

mysql_free_result(result_from_last_query)

This function clears the results of the last query, which has two advantages. First, it lets the server know that the memory used to store the previous data is now available for other things. Second, it eliminates the chance that you may accidentally reuse the same result set.

Note

Technically, the mysql_free_result() function only needs to be called when memory consumption is an issue, typically for very large result sets. For smaller data, using this function can result in higher memory use than not. However, the by-product of its use is a clean break from data that is no longer needed to complete the script.

mysql_close(database_connection_handle)

Assuming the handle representing the database link is valid, this function disconnects PHP from the database server. The link to the database will automatically end when the script finishes running, but it is good practice to close it explicitly.

Building a Custom Function

In any given program, you may decide to write to files or interact with a database. You may do a particular kind of sort on an array, for example. If you only have one program, where you put the code to do these things won’t make that much difference. However, if you have several pages that all do the same kinds of things, being able to reuse your code becomes exponentially more important.

Imagine writing a program to build a simple web page with a fancy header menu. Your strategy is to build the page, then duplicate the code for 11 other similar pages, making minor adjustments to the visible content. That may be the quickest and most efficient way to build the pages, but what happens when you want to make changes to the fancy header menu? Instead of updating it in one place, you need to make the changes 12 different times!

Making use of includes and custom functions is one way to make your code easier to maintain and simultaneously cut down on the number of lines of code on any given application page. The sample applications in this book use included files with custom functions to do just that.

Including more code in your application

Bringing additional code into the scope of your PHP script is easy with the include() statement. This function looks in the specified path for the desired file and evaluates its content as if it had been typed into the original script:

include $root_path.'environment.php';

Note

The period (.) between the $root_path variable and the rest of the text is a concatenation operator. It connects two or more strings together to form one long string.

Debugging with includes used to be a bit problematic, since problems parsing the included code didn’t stop the calling script from running. Now, though, PHP makes sure all of the code it sees is syntactically correct before executing anything.

To do its thing, the include() statement either needs to know the exact path where the file is located or needs to be able to find the file among the locations added to the include_path list. If it can’t find it, the script will issue a warning but will continue to run until the absence of that code proves fatal. If you don’t want that behavior, there is another command—require()—that works just like include() but kills the application the moment it can’t find an included file.

Included files don’t have to live in the web path, which is another reason to use them. If you have access to your server account’s document root, you can move all of your included files into a path that other people can’t see from the Web. This eliminates the chance that someone will call that script by accident or intentionally try to make use of it. It also allows you to store hardcoded access information—such as the username and password you are going to use to get into your database—without exposing that sensitive information directly to the Internet.

Once you have a separate file attached to your PHP script, you have to figure out what to put in it. It is certainly possible to simply add a bunch of code that does something like setting commonly used variables or opening a log file. The script will treat that included code as if it had been written into the calling file, so the application will work. However, where you include the file will be meaningful. If you need to manipulate some variable data and you place that code in an included file, you have to make sure those variables are filled with what you need before you include the file.

For this reason (and others), it is good practice to contain your external code in functions that can be called when needed in the original script rather than run at a specific point when the file is included. If you do this, you can group the include() statements at the top of the script where it is easy to see what you need, and you can define multiple functions in an external file that can be used anywhere after that point in the calling script.

Defining your own function

I make sense of functions use a shopping metaphor. My includes directory, where I put the external application files, is the big mall. Each file is a wing or level of that mall, and any functions defined in it are the stores located in that wing. Stores tend to have their own structures and purposes once you cross their thresholds, and you can go back to them whenever you need the particular goods they carry.

To construct your function-store, you need to first create that threshold. You can accomplish this by creating a new block (between curly braces) that declares the function, provides a name for it, and defines any parameters it will accept. Example 3-5 shows a simple function is called name_of_function that takes up to two parameters. The first parameter is required—PHP will choke if at least one parameter isn’t included—but the second, because I’m assigning it an empty string as the default value, is optional.

Example 3-5. A custom function returning an array

function name_of_function ($parameter1,$parameter2='') {
    $foo = $parameter1 + 1;
    $bar = $foo . ' was required ' . $parameter2;
    return array($foo,$bar);
}

Within the block of code—the inside of the function-store—are the interesting goods. For the function created in Example 3-5, we want to increment the first parameter and format a little message based on the new value and the optional text that we can pass to the function. The last statement is where we use the return() construct to deliver the goods—in this case, an array with the two manipulated values. In essence, this is the part where the stuff you buy gets put in a bag for you to carry home.

Note

PHP differentiates between constructs and functions, even though both look the same in the documentation (with parentheses added after their names). include() is a construct that does not require the parentheses to run. function_exists() is a function with a parameter and a need for parentheses to evaluate what is passed to it.

Constructs do not appear in the list of known functions when using function_exists() to find out if code you want to use is there.

Custom functions are invoked in the same way that built-in PHP functions are called. How you code them depends on what kind of information is being returned. If it will return a Boolean (true or false, 1 or 0) or nothing at all, the function does not usually need to be assigned to a variable. This is common for functions that are used as expressions in loops. However, if the function will return a value, you’ll want to capture it in a variable when you invoke it. When multiple values are returned—as is the case in Example 3-5—the function needs to send them back contained in an array, which is then received in the original script using the list() function. Here are some examples from the sample applications:

getHeader($app_title,'css/uar.css'),

$scrambled_password  = scramblePassword($twitter_password);

list($thisFeed,$foo) = parseFeed($rss_feed);

Remember that the variables are versatile: they can be numbers, strings, list arrays, associative arrays, or even objects. Because PHP is a “loosely typed” language, you may need to use the is_type functions (is_int(), is_string(), etc.) to find out what kind of data you have. The array-to-list transfer is only needed if you need to send back different kinds of variables (a string and an array, for instance) or if you want to be able to parse the resulting data on the function side. In that case, the function can simply fill an array with the manipulated values and return that instead.

Because your code will be assembled on the fly and is intended to be reused in many ways by different applications, there is a built-in function that may add some grace to it. The function_exists() function checks a list of all defined functions, both those built into PHP and those defined by the programmer, and returns true if a function with the specified name is detected:

if (function_exists('apiRequest')) { $data = apiRequest($url); }

This is useful as a way to make sure the proper code is available and, if not, to allow the program to handle the problem gracefully, either by reporting it to an administrator or by simply exiting with a web page response the user can read. The alternative is an error that could kill the program and reveal some of your code in an ugly message.

SimpleXML

When it comes to sharing data, XML is the prevalent way to format information into a structure that is easily parsed. Your PHP application will need to parse XML at some point if you get data from APIs, including the API Twitter provides. Before PHP 5, you had to build your own parser, looking in the string for patterns to divide up the formatted data in a meaningful way. Now, however, PHP comes with an extension called SimpleXML that does the heavy lifting for you.

The SimpleXML extension turns XML structures into embedded objects containing the data as associative arrays. Each embedded tag in the XML becomes another link in the PHP object, as in $xml->childnode->node['attribute']. The sample applications use SimpleXML to parse the data received from the Twitter API and to create new XML documents with the data the applications collect. Here are the methods you’ll see:

SimpleXMLElement(well_formed_xml)

When you want to start building XML from scratch, SimpleXMLElement can help. It turns a well-formed XML string or a path to a file containing XML data into an object that can be iterated, edited, and expanded with a variety of methods. The object is a collection of tag names associated with the content the tags contain and the attributes they are assigned. Nested tags show up as objects that themselves can be parsed into tag names, attributes, and values.

Example: $xml = new SimpleXMLElement($base_xml);

simplexml_load_string(well_formed_xml_string)

For simply parsing XML data that already exists, this function takes the data in the form of a string and returns a navigable object.

Example: $xml = simplexml_load_string($data);

getName()

The getName() method returns the name of the root tag for a particular XML object. When the XML object is first created, this will be the main root of the document, but this method can be used for nested objects as well.

Example: $root = $xml->getName();

children()

The children() method creates an iterative array of objects representing the nodes directly below the XML object calling it. Each child object can then be explored for a name, value, attributes, or any other XML objects it contains.

Example: foreach($xml->children() as $x) {$ids[] = $x->id;}

addChild(name_of_new_node,value_stored_in_new_node)

This method is what allows you to create your own XML documents. It accepts a text value to describe the name of the new nested tag and any value you want it to contain. It returns the new XML object, which can then be edited and expanded as if it had been part of the original XML.

Example: $xml->channel->addChild('newchild','my text'),

asXML()

All of the manipulation of addChild() only changes the XML object that PHP has stored in memory. To make the changes real, you need to turn the object back into a string of well-formed XML text. This can be printed or stored in a string for later use in the application. asXML() will also accept a filename parameter and write the file directly to a document.

Example: echo $xml->asXML();

DOM

SimpleXML is great, but there are other ways to extract data from an XML document. The Document Object Model, or DOM, is a standard object model to describe markup and make it able to be manipulated by other applications. It is a required part of JavaScript, to give the scripting language the ability to access and change web pages on the fly. Some web browsers also use DOM to render web pages from HTML. DOM is particularly useful for accessing the markup out of order, navigating back and forth in the nested nodes, or jumping directly to different parts of the document.

The W3C DOM has three parts. The Core describes any structured document. There are also specific object descriptions for both XML and HTML. As of PHP 5, the DOM extension has been added to the PHP arsenal. The sample applications use DOM to parse RSS feeds. The following are the methods you’ll encounter.

DOMDocument()

As with the SimpleXMLElement object, a new DOMDocument object can be created using this method to hold the XML data you want to explore. This gives PHP a framework for loading and parsing the structured data.

Example: $doc = new DOMDocument();

load(url_to_well_formed_XML)

To fill the new DOM object with XML data, you have to let PHP know where to look for that data. For RSS feeds, the simplest technique is just to point to the URI for the data you are trying to parse. This path can also point to local files, returning a Boolean true or false to indicate success. The object must already exist before it can be filled.

Example: $doc->load($url);

getElementsByTagName(name_of_node)

The DOM can retrieve a specific node from the full document, returning it as an object containing all of the matching objects. The getElementsByTagName() function creates a new instance of the node class, giving it access to more tools to explore the node in detail.

Example: $items = $doc->getElementsByTagName('item'),

nodeValue()

At the node level, the value a particular node contains within its start and end tags can be retrieved with nodeValue(). The value can be a date, text, number, or other data type, depending on the kind of node being explored.

Example: $title = $item->item(0)->nodeValue;

getAttribute()

Another tool at the node level is getAttribute(), which returns the value associated with a tag attribute, such as the link stored in the href of an anchor tag. If an attribute is not found, an empty string is returned. For singleton tags, the attributes are where the node data is located.

Example: $link = $node->item(0)->getAttribute('href'),

cURL

PHP includes a library for retrieving URLs called cURL. This free software can handle a wide range of data transfers and HTTP methods. PHP includes a few tools to allow your code to access this important functionality.

Note

The PHP developer community typically shares code to make coding easier for everyone. One way this is done is by providing classes that can be used to extend your own installation of PHP and make the code a bit easier. You can find some cURL classes to replace the functions discussed here at http://www.phpclasses.org/searchtag/curl/by/package/tag/curl/.

Although cURL can do much more, the sample applications use the following cURL functions whenever they need to interact with the Twitter API:

curl_init()

To use cURL to access a remote file, you must first create a handle that can be used to reference the connection later in the program. curl_init() can be initialized as an empty shell and filled later, either by setting the CURLOPT_URL option or by passing a URL as a parameter. The new handle is returned upon success.

Example: $cURL = curl_init();

curl_setopt(cURL_handle,name_of_option,option_value)

The curl_setopt() function allows you to configure an initialized cURL handle by setting some of its many options. The options are named with the CURLOPT_ prefix and include the following, which are used in the sample applications for this book:

CURLOPT_HTTPGET

If set to true, this option causes the cURL handle to use the GET method for the next HTTP request. The Twitter API methods where data is queried and received all use GET. This is the default for a new cURL handle.

CURLOPT_POST

If set to true, this option causes the cURL handle to use the POST method for the next HTTP request. All of the Twitter API methods involving data changes use POST.

CURLOPT_POSTFIELDS

When the server application on the other end of the request requires an HTTP POST request, the fields containing your data must be passed as an encoded string using this option.

Note

If you are experiencing problems using CURLOPT_POSTFIELDS, it may be because of the default headers being passed by cURL. The Expect header may have a value like “100-continue header,” which tells the server to post your data only if it responds with a status code of 100, or Continue. To get around this, you can simply clear the value in the Expect header to have cURL post without waiting:

curl_setopt($cURL, CURLOPT_HTTPHEADER,
    array('Expect:'));

CURLOPT_RETURNTRANSFER

When executing the configured HTTP request, this option should be set to true to prevent cURL from outputting the content it gets. You want to fill strings to parse.

CURLOPT_USERPWD

Twitter’s API requires authentication for many of its interactions with member data. The username and password for the accessing account can be appended to the cURL request using this option and a value in the format [username]:[password].

CURLOPT_URL

This is perhaps the most important option—it tells cURL where to go to get the goods (i.e., which URL to fetch). It can be set when the handle is initialized as well.

CURLOPT_USERAGENT

When using someone’s API, it is courteous to let the developers know who you are. Setting the User-Agent header in the HTTP request to something meaningful using this option will accomplish this.

The curl_setopt() function returns a Boolean to indicate success.

Example: curl_setopt($cURL, CURLOPT_URL, $curl_url);

curl_setopt_array(cURL_handle,array_of_options)

When you have to set a number of options at once, it may be easier to first create an associative array filled with the option names and values to send to cURL all at once. curl_setopt_array() sets multiple options for a cURL session. Specify the active cURL handle and the array of options, and the function will return a Boolean to indicate success. If any single option fails, the remainder of the array will be ignored and the function will return false.

Example: curl_setopt_array($cURL, $cURL_options)

curl_exec(cURL_handle)

The meat of this suite of functions is curl_exec(), which actually executes the session as it is currently configured. If the CURLOPT_RETURNTRANSFER option has not been set to true, this function will output whatever it finds at the other end of its request and return a Boolean to indicate success. Otherwise, curl_exec() returns the contents to save in a string.

Example: $twitter_data = curl_exec($cURL);

curl_getinfo(cURL_handle,information_name)

Each request has a meta-information channel that describes the session that was just executed. curl_getinfo() gets that information and either returns it as an associative array containing all the available data or, if you include a specific variable name, returns just that variable’s value. The meta-information includes url, content_type, http_code, filetime, total_time, size_download, speed_download, and download_content_length.

Example: $status = curl_getinfo($cURL, CURLINFO_HTTP_CODE);

curl_close(cURL_handle)

When you’re finished, use curl_close() to end the session and free up all of the application resources that were devoted to the HTTP request.

Example: curl_close($cURL);

Note

If your server does not support cURL, there are other ways to get remote content. The file_get_contents() function explained in the section File Management of this chapter is an option, provided PHP has been configured to turn on allow_url_fopen. A more complex option is fsockopen(), which should work everywhere.

Debugging

Let’s face it: programming involves a lot of trial and error. The first draft of an application rarely works perfectly, and the more code it contains, the more potential there is for problems that will need to be investigated and fixed. The process of debugging an application is greatly helped by good documentation and the ability to see what is happening at the points where problems are occurring.

Programming languages, including PHP, include ways for you to annotate your code without it affecting the processing that is done. Comments do come with a little extra overhead since there is more text to deal with, but the benefits greatly outweigh the trivial cost of including them. PHP comments work in a similar way to those in PHP or Perl—text that follows a special marker is interpreted as nonexecutable content and ignored when the program runs.

Comments are meant to serve as reminders for the programmer and to communicate to other programmers what is happening at various points in the code. Unlike with HTML, where anyone can look at the source behind the rendered page, the only people who will see PHP comments are programmers with an interest in and access to the script files.

There are two kinds of PHP comments. The first type is the single-line comment. By prefacing text with a double-slash character combination (//), you signal that the text between the comment character and the end of the line should be ignored during execution. For example:

// This is good for line-by-line annotation

The hash character (#) will work, too.

Note

I’ve always preferred hashes, because there is less to type and because they stand out visually in a way the slashes do not. However, most modern text editors with PHP libraries will be able to understand what comments are and display them in a different color (typically light gray) to separate them from the rest of the code.

PHP also supports a way to comment out multiple lines of text: it looks for the character combination /* to indicate the start of a comment block and the combination */ to signal the end of the block. Anything PHP encounters between these two markers, including line breaks, is considered a comment and is ignored during execution. For example:

/*
 *   Use Comments
 *
 *      Liberal use of comments can only help a programmer make
 *      sense of code, particularly as time passes and the
 *      reasons for using a particular function fade from memory.
 *      Except for the first and last lines to open and close the
 *      comment, the asterisks on the left are optional and are
 *      included merely for aesthetic purposes.
 */

Comments help a programmer remember or understand what a section of code does and maybe why certain programming decisions were made, but they don’t do anything to help fix problems in the code when they arise. To do that, you need a way to peer into the inner workings of the application, from the server’s perspective. Outputting variable values at different points in the program is a simple technique for debugging in PHP, to see what is changing and where.

The echo() construct is one way to output data: when it’s executed, the variables that come after it are evaluated and displayed in the terminal or browser for all to see. For example:

$debug = 1;
if ($debug) { echo $div_info; }

You probably don’t want a bunch of debugging text to show up in a production site, but for early coding and in development servers, being able to see values change is worthwhile. I sometimes build a debugging switch into the code that allows me to change one value high in the code that will tell all of my debugging output to display.

There is another command—print()—that does the same thing. “What’s the difference between echo() and print()?” is a common question about PHP. The quick answer is, “Not much. Use what you want.” However, there are some subtle differences that can affect your coding decisions. First, echo() is a little faster, since it doesn’t return any value (print() will return a Boolean to indicate success, which might be important for logic flow). The other important difference is that echo() can accept multiple parameters, so you can concatenate strings together with commas. For print(), only one string is allowed as a parameter.

For arrays, output becomes a little more complicated; all of the keys and values run together, making it very difficult to read. Fortunately, PHP has a special function for investigating array contents, called print_r(), that will display the contents of an array in a way formatted to be readable by humans. Similarly, the var_dump() function will iterate through arrays and objects, displaying all of the information they contain and indenting to show how the content structure is nested:

print_r ($array);
var_dump ($array);

Finally, there is the exit() command, discussed earlier in the section Controlling the Flow of Logic. If your application does a lot of things, such as writing to a database or changing saved states that you might later need to revert, a well-placed exit() will save you some headaches by killing the application at the desired point. Just remember to remove it after you figure out a solution to the coding problem!

MySQL

There are a number of reasons it may become necessary to store data outside of your application in order to support what it does. We do this when we want to pre-populate web forms with saved information, and when we want to show only a part of a larger data set. The raw information is stored in a database. A database can take many forms, including a simple text file where you write the information you want to retrieve later. The dominant form of database, however, is the relational database management system (RDBMS).

MySQL is one widely distributed relational database system that is commonly installed on web hosting servers for your use. You may be limited to creating a certain number of databases. If so, don’t worry; in this book we’ll only use one.

Note

The Structured Query Language (SQL) has been around in some form for almost four decades. One of the criticisms of the language is that different servers have slightly different syntax variations, so SQL statements that run well in Microsoft SQL won’t necessarily work in MySQL. This book deals specifically with MySQL syntax.

MySQL databases can be very powerful. The interaction with a typical web application, though, boils down to a handful of statements for the following tasks: creating tables, selecting data, inserting new data, updating existing data, and deleting data you no longer want.

Creating a New Table

When you first create a database, there is no there there. It is just an empty shell; you could use PHP to connect to it, but that’s it. To make it a useful place to store and filter data, you must first create a structure to contain the data.

The CREATE TABLE statement accomplishes this by describing the fields each row, or tuple, of data can store. Each field is assigned a name and a data type, plus other information about its size, default value, and whether it can accept NULL values. MySQL also would like to know how you might want to access the data in the future. You can aid data retrieval by specifying which fields are going to be indexed for search. The primary key—the field that contains the unique identifiers to distinguish one record from the next—will always be indexed, but you can also specify that indexes be maintained for other fields. Example 3-6 shows a table called access with five fields.

Example 3-6. Sample CREATE TABLE statement

CREATE TABLE IF NOT EXISTS 'access' (
    'record_id' int(4) NOT NULL,
    'password' varchar(255) NOT NULL,
    'created_at' datetime NOT NULL,
    'date_processed' timestamp NOT NULL default CURRENT_TIMESTAMP,
    'is_enabled' tinyint(4) NOT NULL default '1',
    PRIMARY KEY  ('record_id'),
    KEY 'is_enabled' ('is_enabled')
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=74462 ;

Most database clients—including the web-based phpMyAdmin tool many web hosting companies use to allow user access to MySQL (see Figure 3-1)—provide some basic GUI or form that you can use instead of actually writing the SQL statement to create a table. The important definitions are still the same, regardless of the method of creation: you need to tell the database what kind of data you want to store and how you are likely to try to retrieve that information later.

Figure 3-1. phpMyAdmin is a web-based client for MySQL databases

Retrieving Information from the Database

Most of your application’s interaction with the database is likely to be in the form of SELECT statements. These SQL commands ask the database to look in the contents of its tables and return specific data that matches your criteria.

SELECT statements have a few main parts:

SELECT names the fields you want to get.
FROM indicates where to get the data.
WHERE specifies some conditions to filter all the data into a useful set.
ORDER BY allows you to sort the results on one of the returned fields.

Additionally, you can take advantage of the relational nature of MySQL by joining multiple tables together to create a new collection of data that isn’t explicitly stored in the database. LEFT OUTER JOIN will connect two tables with a common field, without requiring that the second table have a matching record. If it doesn’t, the fields that would have been filled by that table are returned with NULL values. An INNER JOIN requires that both tables have a common record. Example 3-7 shows a SELECT query involving two tables used by one of the sample applications, tweetbroadcast_tweets and tweetbroadcast_groups.

Example 3-7. Sample SELECT statement

SELECT  DISTINCT t.status_id,
        t.author_username, t.author_fullname, t.author_avatar,
        t.tweet_text, t.tweet_html, t.pub_date,
        concat(year(t.pub_date),' ',LPAD(dayofyear(t.pub_date),3,'0')) as tweet_day
FROM    tweetbroadcast_tweets t
        INNER JOIN tweetbroadcast_groups m ON m.other = t.author_username
WHERE   m.owner = 'kmakice'
ORDER BY t.pub_date desc
LIMIT 0,20

The LIMIT clause will restrict how much of the full data set is returned. It is associated with two numbers, separated by a comma: the first number reflects the record that should become the first record in the returned data (0 indicates MySQL should start at the top), and the second number is the maximum number of records that should be included with the initial record. The query in Example 3-7 will return the first 20 tweets authored by people in my broadcast group.

The last clause of interest to us is GROUP BY. This is an aggregation instruction that tells MySQL to return only one record for each unique group, summing, counting, or calculating the other fields over the values in all of the group’s records. For example, if I had a table that stored all of my family’s Twitter status updates (my wife, my son, and yes, my dog all use Twitter), I could generate some stats for each author by grouping on the username field and counting records. Such a query would tell us that my dog tweets more than my son. It’s very sad on several levels.

Alternatively, if your goal is simply to eliminate redundancy in the data, you can use distinct in the SELECT clause to limit the data set to just one instance of any given combination of the selected fields. If I collected my son’s tweet archive several times and had duplicates of the same records, without using distinct his aggregated statistics would be much higher than the actual count. distinct is handy in statements using GROUP BY as a way to count the number of different values in the data set, as in:

SELECT COUNT(distinct username)

The fields used in the SELECT list and the WHERE conditions for the search can be manipulated using functions in MySQL. They work in a similar fashion to functions in other languages, like PHP, in that you pass a value as a parameter and get some kind of response. For example, DateDiff() will examine two dates and return the number of days that separate them. These kinds of functions become very useful when trying to compare similar data expressed in incompatible formats and for shaping the way the information is returned in the query.

Changing Information in the Database

Of course, the only way to get something out of the database is to put something in. The data doesn’t appear by itself; it must be added and kept up-to-date with the help of some special editing statements that insert, change, and delete values in table fields.

Although it is possible to transfer information from one part of a database to another by combining INSERT and SELECT statements, for our purposes it is sufficient to add data one row at a time. To add data to a table, the INSERT INTO statement lets you specify a table and the field(s) you want to fill. You then specify the data using the VALUES clause, listing the new information in the same order you listed the field names. See Example 3-8.

Example 3-8. Sample SQL statements to change a database

/* This inserts a new row of data into 'autotweet_profiles' */
INSERT INTO autotweet_profiles
    (user_name, password, rss_url)
VALUES
    ('kmakice', 'DKFSHOIER*S(R(WE%', 'http://www.blogschmog.net/feed/')

/* This changes the values in three fields of a record in the table */
UPDATE   autotweet_profiles
SET      password = 'DKRKSIKDLER*KDOUFLIEO*',
         rss_url = 'http://www.makice.net/blogschmog/feed/',
         is_enabled = 1
WHERE    user_name = 'kmakice'

/* This removes all records for 'kmakice' from the table */
DELETE FROM autotweet_profiles
WHERE    user_name = 'kmakice'

UPDATE is a statement that works on existing records in a table, allowing you to change the stored values of specific fields. As with SELECT, you must first identify which table is to be targeted and which records are being changed, using the WHERE clause. Updates also use a SET clause to list the fields of interest with their new values. UPDATE statements can affect more than one row, based on the criteria defined in the WHERE clause. All of the matching records will be set with the same values specified in the SET clause.

Finally, there is the DELETE FROM statement, which removes records from data tables. To delete data, you specify the affected table and the criteria that need to be matched. Any records matching the WHERE clause will be removed.

A Place to Call /home

The first thing you’ll need for your new web application is a home. All of your brainy ideas and masterful code won’t be any more useful to other people than an email from your grandmother if your code can’t be compiled and do something interesting. This section briefly looks at some of the things to consider when searching for a server from which to publish your new application.

Selecting a Host Server

There are a number of factors you will need to consider when selecting a web host to publish and protect your work. The most important one (for your bank account, at any rate) is cost.

Hosting services can have a few different kinds of configurations. These include:

Shared hosting: Racks of machines are set up, and your account shares physical and virtual space with other accounts. If their sites go down, so do yours. If you stress the processor with a bunch of big queries, other accounts suffer the consequences, too.
Virtual Private Server (VPS): Your slice of the big shared hosting pie includes CPU, RAM, and disk space that are not affected by and will not affect what happens on other VPS accounts. You can have root access to your virtual machine.
Dedicated server: You own the server, but you don’t have to maintain it. This arrangement is like VPS, except that “virtual” is replaced with “physical” (your server is a real machine that no one else uses).

The best analogy I’ve seen is a housing analogy: dedicated hosts give you a mansion, VPS hosts give you an apartment, and shared hosts make you live in a dorm room.

Shared hosting services can offer web space and a lot of built-in support for about $10 a month. These companies try to make it very easy for people to install open source tools such as blogging and chat applications, and they will almost always support the most popular development platforms. What they won’t do very well is help you with your code. If something breaks, odds are good you will have to work out your own solutions or turn to the community of web developers who share their expertise in online forums. Shared hosts frequently have slow and crowded databases, which may become problematic if your awesome new web application takes off.

The low-cost options also put limits on the amount of traffic, or bandwidth, you are permitted. For most small sites, this limit may seem impossibly high; it may be 10 times as large as the amount of hard drive space you are allowed, which may be a few hundred gigabytes. Text takes less bandwidth than images or movies, but if a few hundred thousand people start visiting your website, even the text adds up.

Don’t underestimate the importance of bandwidth limits when it comes to a Twitter application. News of interesting tools spreads quickly among Twitter’s several million accounts, and developers are often overwhelmed with the response—just ask Ryo Chijiiwa, the creator of Twitterank (see Tools for Statistics). It is not uncommon to have to switch web hosting to an account or a company that can better handle the traffic. It would be best to be proactive and find a hosting company that is prepared to scale with your application. Upgradable VPSs, particularly those with some allowances for bursting past your allotted limits, should meet these needs.

Note

Before you let people know about your great new Twitter application, check with your web host about what happens if you unexpectedly exceed your bandwidth limit. In some cases, automatic charges are levied based on the amount of traffic you have. In other cases, your account—including any other websites you may be hosting—will essentially be shut down until you do something to upgrade your host configuration.

Other factors to consider include the hosting service’s track record for keeping the servers up and running (99% uptime should be a minimum requirement), availability and responsiveness of tech support, domain name registration, secure FTP access, usage statistics reporting, and whether your account can support secure transactions. For the purposes of this book, the most important criteria involve common but vital functionality available on most modern web servers. The following are the primary requirements:

MySQL database server
PHP scripting language
A place outside the web path to place supporting code
cron jobs for scheduling tasks

The web host on which the sample code in this book was developed was a Linux server with PHP version 5.2.6 and MySQL version 5.0.67. In computing, things change quickly and incrementally, so even the most on-the-ball server administrators may lag a little behind these server application developers. If your host is reasonably close to the latest releases, you should be fine. Even if it’s a bit behind, though, you should find that most of the core functionality you use works with earlier versions.

Note

Both PHP and MySQL do add some useful functions in major new releases, such as str_split() in PHP and LOAD XML syntax in MySQL. If you are having difficulty getting something to work, double-check the version of the server software against the version requirements of that function found on the documentation websites:

Automation

When you build a web application, the person using the site triggers much of its functionality. There’s no need to fetch data until someone shows up at the website asking for it. However, there are situations where you won’t want to rely on web traffic as a catalyst for your program.

Servers do have some easy ways around this, such as using a cron job in Linux or the Task Manager in Windows. The latter platform also supports proprietary scheduling tools such as nnCron and VisualCron, which adds a GUI and a lot of functionality to the process of automating tasks.

Traditionally, a special text file known as a crontab handles the work of scheduling tasks to be performed at regular intervals. A special syntax instructs the server when and where to look for scripts to run: you supply the path to the script and tell the server when to run it by specifying values for the minutes, hours, day of the month, month, and day of the week fields, with an asterisk (*) serving as a wildcard to allow any value. In this example, script.cgi will run at 3:30 a.m. every Saturday:

30 3 * * 6 /path/to/script.cgi

The crontab file is usually tucked away out of the reach of casual users. Through a web host control panel, however, scheduling a cron job is as easy as filling out a web form. You select the intervals for how frequently the task should run, and the crontab entry is generated for you.

The only tricky part is how to reference the PHP script. Some systems will accept either a URL to a web page or a server path to the .php file; others require just the server path. Since the server just sees the file as text and not as a powerful script, you must also tell the crontab to render it as PHP. This is accomplished by simply referencing the path to the PHP interpreter:

/path/to/php /path/to/your/cron_task.php

Note

If you can’t get your cron job to parse the PHP, you can try to force the server to do so using a text-based browser (/path/to/lynx) or an HTTP request (/path/to/wget) instead of calling the PHP parsing engine directly (/path/to/php). You will have to use a URL instead of the file path to reference the PHP script.

When the cron job runs, the web host may have its server configured to email you the status of the script along with its output. Those notification emails can add up—scheduling a task to run once a minute will result in 1,440 emails each day.

Pseudoautomation hacks

If you don’t have access to a crontab file, there are still ways you can simulate automation.

One way is to use a free pinging service to start your program each day. Companies such as Site24x7 will try to load a web page at regular intervals and generate a report on how successful their attempts were. Although this is useful to generate server uptime statistics, it can also have the consequence of launching a PHP page. Once launched, that script can do all sorts of things, including mining data or cleaning up files. You can also insert a call to your backend task either as part of the script generating the HTML or in the HTML itself, using <script> or <img> tags that point to the PHP file you want to run.

These tricks are not recommended, however, since they are both unreliable and can potentially get in the way of the web content you want to display. If that functionality (for example, a data-mining operation) takes a long time to run, your page load times may suffer. This is particularly problematic if the web page is heavily trafficked. There is also the risk that no one will visit your website on a given day. If that happens, nothing will trigger the backend task, and therefore nothing will get processed. As a result, these techniques are only really useful in situations where the jobs don’t have to run regularly.

Note

If your web host can’t support cron jobs, that may be a good reason to find another host. Much of the magic possible with the Twitter API comes while you are sleeping and your code is running on its own.

Table of Contents for 3. Web Programming Basics

Create new playlist

Sign In

Sign Up

Chapter 3. Web Programming Basics

Note

XHTML

Note

Web Pages

Note

Warning

Form tags

Warning

Note

A Nod to Some Other XML Structures

CSS

Note

Warning

Assigning Styles to Structure

Laying Out Your Web Page Content

Decorating the Web Page Content

Getting the Browser to Recognize Styles

PHP

How to Accept Candy from Strangers

SQL injection attacks

CRLF injection attacks

Cross-site request forgery (CSRF) attacks

Note

Strings, Arrays, and Objects

Manipulating the Data You Collect

Note

Knowing Your Environment

Warning

Note

Note

Controlling the Flow of Logic

if…elseif…else

while

Note

for

foreach

switch…case…break

Note

try…catch

die() or exit()

Note

File Management

Note

Note

Warning

Connecting to the Database

Note

Note

Building a Custom Function

Including more code in your application

Note

Defining your own function

Note

SimpleXML

DOM

cURL

Note

Note

Note

Debugging

Note

MySQL

Note

Creating a New Table

Retrieving Information from the Database

Changing Information in the Database

A Place to Call /home

Selecting a Host Server

Note

Note

Automation

Note

Pseudoautomation hacks

Note

Further Reading

Table of Contents for
3. Web Programming Basics