kmakice Feeling a little shaky about the whole coding thing? Take a crash course in the languages you will use to build your Twitter app.
As demonstrated in the first two chapters, Twitter is a rich playground for both members and developers. Before we get into specifics on how you can join in the fun, we should spend some time reviewing the mechanics of working with the languages you will be asked to use to build your new Twitter web application.
This chapter provides an overview of the basic knowledge and tools needed to create the applications described later in the book. Although the API can be used to create desktop and mobile applications as well, this book focuses on the web platform built with PHP and MySQL. Even for experienced programmers, it won’t hurt to skim this chapter so you understand the scope of what is to come. However, if you are confident in your skills with XML, CSS, PHP, and MySQL, you can skip to Chapter 4 and jump right into the methods available in the Twitter API.
This chapter reflects what is needed to build and install the suite of sample applications to get you started using the Twitter API. It is not meant to be a replacement for resources dedicated to improving individual skill sets for XHTML, CSS, PHP, MySQL, or server management. Suggested reading on those topics can be found at the end of this chapter.
The extra “X” may look intimidating, but Extensible Hypertext Markup Language (XHTML) isn’t much different from regular HTML. The main purpose of the revision is to help make web pages better supported by paying more attention to the structure of the data. Whereas HTML did not seek to separate data from presentation, XHTML puts the burden of presentation on the stylesheets.
HTML is not XML. Compliance with XML means that the tags need to be
“well-formed,” or follow the conventions required by documents of that
type. These conventions include use
of lowercase in tag names (XML is case-sensitive) and the insistence that
all tags are closed, either explicitly with a second tag (<p></p>
) or implicitly as part of a
singleton tag (<br />
). Attribute
values also have to be enclosed in quotes, either single or double. In
HTML, you can omit the quotes around integer and Boolean values and the
browser will still understand them.
The advantage of XHTML crystallizes when using XML tools. For example, XSLT is used for transforming documents, and XForms lets you edit XML documents in simple ways. Integration with other tools, such as MathML, SMIL, and SVG, is not possible with regular HTML. Of the three versions of XHTML—Strict, Transitional, and Frameset—Strict is the one to use. A lot of problematic tags are deprecated, so using Strict makes your code better prepared for future standards. It also makes it easier for many different devices to properly interpret the page content, since there are fewer unexpected tags and attributes to encounter.
The easiest way to ensure strict compliance is to use the World Wide Web Consortium’s (W3C) validator, at http://validator.w3.org. Many text editors also have validation and formatting tools built in.
The rest of the material in this section is pretty straightforward. I include it only to make sure we are all on the same (web) page when it comes to the markup.
Open up any XHTML-friendly text editor (such as TextMate), and you are likely to find a template for a web page that resembles Example 3-1. As soon as it is created, you can forget about these tags; they aren’t useful outside of the page definition.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <style type="text/css" media="all">@import url(my_styles.css
);</style> <title>The Basic Web Page
</title> </head> <body>This is the stuff you can see.
</body> </html>
The stuff that is meant to be visible to the person visiting your
website is located between the <body></body>
tags. After changing
details such as the name and location of your external CSS file and the
title of the page, the tags to focus on are the ones that will describe
the content you want to display. Here is some information about the
types of tags used in the sample applications for this book:
These elements have the inherent property of
creating a new line of content when displayed, forming discrete
blocks of content when rendered by the browser. The most general
block element is the <div>
tag, which is primarily used
to identify each section using the unique id
attribute. The stuff contained within
a given division is isolated from the rest of the page and styled
according to the attributes of the enclosing <div>
tag.
More familiar forms of block elements include the heading
tags (<h1>
, <h2>
, etc.) and <p>
. Each has specific properties,
such as text size and weight (boldness), for which you can specify
values other than the defaults. The break tag, <br />
, is sometimes considered a
block element too, because it forces a new line to be rendered in
the page display.
Example: <div id="me"><p
class="others"><br
/></p></div>
For the content contained within block elements,
different words may require different presentation. XHTML allows
for some special inline elements to handle this without explicitly
defining a style. The two most commonly used elements are <strong>
and <em>
. By default, the former makes
text boldface, and the latter displays text in italics.
Example: This is
<strong>boldface</strong>.
List tags are a special category of XHTML element,
sort of a cross between block and inline elements. Wrapping
content in list tags will force a line break (as with a block
element), but these tags are also used to indent or mark specific
lines of text. There are tags for three kinds of lists: unordered
lists (<ul>
), ordered
lists (<ol>
), and
definition lists (<dl>
).
Which tags you use dictates how the list items are
presented.
List tags are nested. Each item in an ordered or unordered
list must be wrapped in <li></li>
tags, and the
collection of all the list items must be wrapped in
<ul>
</ul>
or <ol></ol>
tags. (Definition
lists are a special case that we won’t go into here, as the sample
applications in this book don’t use definition tags.) Unordered
lists place bullet characters before each item. Ordered lists use
a sequence of numbers, letters, or Roman numerals.
Example: <ul><li>tom</li><li>jerry</li></ul>
Arguably the most important tag in the bag of XHTML tricks is the hyperlink tag. Without it, web pages would have to be bookmarked and loaded separately. Linking is what makes Google searches work.
You can make any text, image, or object an active hyperlink
by wrapping it in anchor tags (<a></a>
). Anchors can be
used to point to somewhere new, using the href
attribute and a referenced web
document, and to name a particular bit of content. Naming comes in
handy to connect XHTML to JavaScript or links from other
pages.
Another attribute, title
,
further empowers links. The title
attribute can store useful
descriptive or instructive information that becomes visible in
many browsers when you move your mouse over the hyperlink.
Example: <a href="#"
title="hi">click<a>
Adding images to your web page may slow down the page loading a bit, but it also makes the experience more enjoyable for visually perceptive visitors. The bandwidth and access limitations of a decade ago, which discouraged the use of images on web pages, are disappearing, as most U.S. homes now have broadband access.[60] Graphics—whether they’re big pictures dominating the page or a few custom icons used for bullet points in a list—enhance communication and enjoyment.
Even with faster connections becoming the norm, it is still good practice to be aware of what images and other media do to bandwidth. Optimize your images for delivery on the Web by using PNG or GIF screenshots, or by compressing JPEG photos.
The <img />
tag—a
singleton that does not have a matching closing tag—has an
attribute, src
, that points to
a file accessible via the Web. The file can be in any one of a
number of formats, but for graphics the most acceptable formats
are .gif, .jpeg, and
.png.
Although most browsers don’t require it, image tags also
need an alt
attribute to be
compliant with W3C standards. Alternative text has two important
purposes. First, before the browser successfully downloads the
image, it uses the text in the alt
attribute as a placeholder in the
page display. More importantly, alternative text is used by web
browsers for visually impaired people to describe the images those
visitors cannot see. Alt text can be blank for images meant for
decoration only, but otherwise, the value should be a short and
meaningful description of the image.
The other attributes that are helpful to include are
width
and height
. These default to the actual
pixel size of the image file if not included, but specifying them
in the code allows the browser to carve out the appropriate bit of
screen real estate while the image loads. The rest of the page is
then unaffected by when the graphic decides to show up. These
attributes also allow for resizing a graphic on the fly, as their
values will override the dimensions of the image for purposes of
rendering the web page.
Example: <img src="hi.jpg"
alt="Hi" width="8" height="8" />
Using the width
and
height
attributes of the
image tag will only resize the display, not the file size. That
10-megapixel shot from your new digital camera will still slowly
load into the browser as a giant 4-megabyte file!
There is an important group of tags that deserves greater mention. Without them, a web page would not be able to gather information from visitors and respond in a relevant manner. Pointing and clicking on links is fine, but forms are what put the interaction in the Web.
Forms allow people to enter data into a web page and send that data back to the host server. Every blog or wiki uses a form to let people publish information. Every online purchase relies on a web form to manage the transaction. Even a simple Google search consists of an input element and a submit button. Here’s a list of the form tags used in the sample applications:
<form>
To let the browser know that the aforementioned kind of
interaction is permissible, you need to define an area of the
web page as a web form. This is done by wrapping content in
<form></form>
tags and specifying a couple of key attributes.
The action
attribute
points to the destination server and file that are expecting the
form data. This can be an email or FTP address, but usually it’s
another web page that will be able to parse the data and
formulate some response to the submission.
The method
attribute
lets the browser know how to encapsulate the data as it is sent
to the server. GET methods turn the data into a visible query
string, whereas POST data is sent with the URL request
separately for processing on the server end. Posted data is the
most common.
Example: <form
action="process.php"
method="post"></form>
<input />
The <input />
tag is one way to collect information from the person
submitting the form. The nature of the input field is dictated
by its type
attribute, which
can be filled with values like button
, checkbox
, file
, hidden
, image
, radio
, and reset
. For the purposes of this book,
we are interested in three specific types: text
, password
, and submit
.
Each text
input is
displayed as a one-line box into which text can be typed. You
can set the value
attribute
of this kind of field to pre-populate the form field with some
text (such as the previously saved value). There is also a
size
attribute that gives you
some control over how long the box should be.
Example: <input type="text"
name="user" value="visible" />
The password
input is
nearly identical to the text
input. The big difference is that the text being entered is
masked with bullets or asterisks. That offers the person doing
the typing protection from onlookers. However, the text is still
plainly visible in the source and is not inherently secure upon
form submission. If possible, use secure protocols and avoid
returning the password back to the form.
Example: <input
type="password" name="pw" value="hidden" />
Text fields are open data entry. That means if it can be typed on the keyboard, it can be sent to the server. This puts some burden on the developer to filter malicious or error-producing content from being fully processed.
This kind of validation can also be done on the client side with JavaScript, but on the server side it must be done with a scripting language such as PHP. Without server-side validation, malicious users (or those that keep scripts disabled for their own browsing safety) can very easily bypass your filters.
The other type of interest is submit
. This turns the input into a
button that can be pressed to send the data to the server. In
this case, the value
attribute becomes the label of the button. By default, the
browser will use “Send” as the button text.
Example: <input type="submit"
name="send" value="Go!" />
Each input tag must also have a name
attribute. This is what
differentiates this particular blob of data from all the rest of
the data you may ask for in your web form. If two elements have
the same name, the data will look like a list on the server side
and may not be interpreted correctly. As with other tags, the
name also serves as a hook for JavaScript and other dynamic
languages.
<textarea>
Sometimes one line of text isn’t enough. Imagine
trying to blog (or even microblog) with only space for one short
line at a time. The brains behind XHTML thought of this when
they developed the <textarea>
tag. This element
does require a closing tag. Any text
between the two tags appears in a big box in the web
form.
If you use styles to change the font family, style, or other formatting, the dimensions will change as the size of the characters changes: 3 rows by 70 columns will look smaller with 9pt type than it will with 12pt. It is more effective to use CSS (discussed in the next section) to determine the size of form objects:
Textarea {width:400px; height:80px;}
Example: <textarea rows="3"
cols="70">My novel</textarea>
<fieldset>
An XHTML element that can be useful for making
sense of web forms is the <fieldset>
tag. The <fieldset></fieldset>
pair
creates a visual grouping of any form elements contained within
to make it clear to the visitor that they are meant to be
together. You can then add a special nested tag—<legend>
—that will wrap around
the text you want to use for the title of this section.
Fieldsets are useful with CSS styles to create tabs, hiding all
but the group of form elements that is currently in use.
Example: <fieldset><legend>Login:</legend><fieldset>
Forms are used in the sample applications to authenticate Twitter accounts and store some basic configuration information. Example 3-2 shows all the form parts put together.
<form action="" method="post"> <fieldset> <legend> Authenticate: </legend> <div id="username"> Username:<br /> <input type="text" name="twitter_username" size="25" value="you can see this" /> </div> <div id="password"> Password:<br /> <input type="password" name="twitter_password" size="25" value="you can't see this" /> </div> </fieldset> <input type="submit" name="submit_button" value=" Go! " /> </form>
Web pages and XHTML are just one form of strict XML structures. The blogs you read typically have RSS feeds, which are formatted using structured tags. The content you will receive from the Twitter API can also be XML.
In this book, we will both read and create new RSS and XML structures. Example 3-3 shows the initial shell for an RSS feed, waiting for dynamic content to be added.
<?xml version="1.0" encoding="UTF-8"?> <rss version="2.0"> <channel> <title>Sample RSS Feed</title> <link>http://www.blogschmog.net/feed/</link> <description>This is my blog.</description> <language>en</language> <pubDate>Wed, 10 Dec 2008 13:18:56 +0000</pubDate> </channel> </rss>
If markup languages are all about structure, stylesheet languages are all about presentation. Cascading Style Sheets (CSS) are used most commonly for web page design, but CSS also applies to other XML structures, such as Scalable Vector Graphics (SVG). Once upon a time, rendering instructions were intermixed with the data structure, causing lots of problems with making things look the same when viewed through different system platforms or on different hardware. With separate stylesheets, however, the same structure can be presented in a way that best fits the context of use.
CSS is used to change the appearance and layout of the structured data, spanning everything from colors to fonts to arrangement of content. It can even hide or reveal content, making the web page interactive when associated with event handlers. This is particularly powerful when teamed with a dynamic scripting language like PHP, as you can create a web page that detects a person’s browser signature or IP address and then select a stylesheet on the fly that is optimized for display in that context.
Stylesheets are built from three basic parts. The selector references the tag element by name, ID, or class. The property is the part that indicates what is to be changed, and the value indicates how it should be presented:
selector { property:value; }
Some related properties and values can be combined to reduce the
number of lines of definition the stylesheet requires to display content
in the way you want. For instance, borders—the lines surrounding a
particular element—have three properties that can be set either
separately (border-width:1px;
border-style:solid; border-color:black;
) or as one combination
property (img { border: 1px solid
black;
).
The browser will prioritize the style changes and render the display. Sometimes, this may produce unanticipated results—for instance, text in a paragraph that you want to appear in red may show up as blue, forcing you to investigate what is causing (or sometimes preventing) the change. Referencing a selector with a different property value later in the rendering process, or simply not knowing which properties are inherited for which embedded tags, can lead to a lot of hand-wringing.
In Example 3-4, the first line of visible text (“body”) is rendered in blue while the next line (“This text is black.”) is black.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/> <title>Test of CSS inheritance</title> <style type="text/css"> *{color:black;} body {color:blue;} .my_class {color:gray;} #my_div {color:red;} #my_div a {color:green;} #my_div h4, #my_div p {color:purple;} </style> </head> <body>body <p>This text is black.</p> <div id="my_div"> <p>This text is purple.</p> <h4>header 4</h4> my_div <a href="#">link</a> </div> <p class="my_class">This text is gray.</p> </body> </html>
Be careful when using the *
CSS
selector—it’s easy to accidentally override another style without
realizing it. Using a “reset” stylesheet typically addresses this nicely
by setting a baseline style for the document body and then telling all
child elements to inherit the styles that you don’t explicitly specify
for them.
The browser’s prioritization is based on a weighting formula that calculates how
important a particular rule is: more specific style definitions are given
a greater weight. A selector of #my_div .my_class
p
, for instance, would be deemed more important than simply
p
or .my_class
, regardless of the order in which the
browser encounters those selectors.
The World Wide Web Consortium officially recognizes CSS1 as the recommended standard. However, there are two newer versions—CSS2 (currently in candidate status) and CSS3 (now in development)—that may one supersede it. These later iterations add support for fine control of positioning, media types, and effects such as shadowing.
I promise that the CSS for the sample applications in this book won’t get too fancy.
Example 3-4 shows several ways in which selectors can be constructed to refer to HTML elements and to dictate how the browser displays content. The syntax relies on a handful of characters:
#
)The hash references a specific ID attribute associated with a tag element in the XHTML structure. Each ID selector can be used only once per page.
Example: #whoopi { color:purple;
}
.
)The full stop, or period, references a specific class attribute associated with one or more tag elements in the XHTML structure. Classes can be used as many times as you want and are great for formatting repeating content, such as dynamic lists.
Example: .kermit { color:green;
}
They can also be used in conjunction with a specific kind of tag to set a style for that combination of element and class. Because this is more specific, it will take precedence over the class alone.
Example: h4.kermit { color:black;
}
*
)The asterisk is a wildcard character that can be used to set the default style for any unassigned tag elements in the page. This is often used as the first style declaration, to set the font size and family for the entire page.
Example: * { font-family:
sans-serif; font-size: .9em; }
You can also use the wildcard to apply styles to everything contained in a specified selector. However, this can lead to frustration if you accidentally overwrite a specific style declared elsewhere in the sheet.
Example: #sidebar * { color: blue;
}
Spaces are used to build a complex string of selectors into a nested style. A nested style is inherently weighted with greater priority than a single-element selector because it is more specific. You can, for example, define all anchor tags to be underlined (that’s the default interpretation by the browser, actually) but turn off the underlining for links in a specific division.
Example: #subtle a {
text-decoration:none; }
,
)Commas are used to group several selectors together and assign them the same style properties at once. This is useful when a number of selectors share the same subset of styles. Any differences can be declared separately for each selector.
Example: #one, #two, #three {
color:red; }
Nested styles are both a blessing and a curse. Such fine control is great, but the more complicated the CSS becomes, the more hair you’ll pull from your scalp trying to figure out why a link buried deep in the web page won’t turn blue. Maintaining good stylesheets for a large website is an art. Reuse and revise your stylesheets as you build your applications to limit conflicts.
In the early days of web design, the people designing web pages often had experience building printed pages, where the position of each bit of text or picture can be finely controlled. The Web, however, is a flexible medium. The same page can be displayed on big monitors and small, with a Mac or a PC, using Internet Explorer or Netscape Navigator. Flexibility and precision didn’t always jive.
Although later versions of CSS promise that fine control, even CSS1 provides several properties that let you dictate where chunks of content are presented on the page. The following are the properties used in the sample applications described in this book:
padding
and
margin
The “box model” best describes margins, padding, and borders. If you picture the content of an element as the innermost box and the element border as the outermost box, the padding is the space between the content and the inside edge of the border. The margin is the space between the border and the nearest other elements on the page. This extra space is like a force field that protects other content from crowding the parts of the element.
Values for the padding
and margin
properties can take
many forms, including percentages and specific lengths. Lengths
are typically measured in terms of pixels (px
), which are absolute, or em-length
(em
), which is a relative
measure based on the width of the letter “m” in the chosen font.
Margins can also have a special value, auto
, which will attempt to balance the
spacing between elements. This is often used to center a block of
content on a web page.
Example: #content { margin:20px;
padding: 1.5em; }
float
Floating an element moves it to the right or the left of the other elements within the parent element, as content flows around it. Floats are used to create sidebars or to describe the relationship between an image and its surrounding text.
Example: #sidebar { float:right;
}
width
This property sets the horizontal dimension of an
element. The width is measured without consideration of the
margin
, border
, or padding
values (or as if all three
values were set to zero). Remember that there is both a left and a
right side to any element, so a division
with a width
of 100
, padding
of 10
, border
of 1
, and margin
of 20
would take up 162 pixels across the
page (100 + 10 + 10 + 1 + 1 + 20 + 20 = 162).
Example: #column { width:350px;
}
text-align
This property sets the alignment of text within a
block element. The usual suspects are possible values (left
, right
, center
, justify
). text-align
cannot be used to affect the
placement of a division; instead, use the float
property to move divisions to one
side of the page or the other.
Example: h2 { text-align:center;
}
display
We sometimes take for granted, based on experience, that certain tags will be rendered in specific ways. Header tags result in larger, boldfaced text, with some vertical spacing above and below. List items are indented and have space between each new item. All of that presentation, though, is part of an accepted interpretation of style properties that the browser adds to the page upon seeing those tags.
With the display
property, you can assign some of those properties to any tag
element. Division tags can become inline objects and anchor tags
can become block elements. Either can be displayed as list items.
This kind of reappropriation of inherent properties is not
frequently done, but display
does have one very useful value: none
. This effectively hides the element
and all of its contents from display. With the incorporation of
JavaScript, this value can be changed on the fly, making pop-up
content and contextual menus possible.
display
differs from
another property, visibility
,
in that with display
the hidden
elements are not factored into the positioning of the visible
elements on the page.
Example: #help { display:none;
}
The other big use of CSS is to make the page look pretty. Most presentational control of this fashion revolves around fonts and colors. The sample applications in this book use some of these basic controls to make the web pages look clean:
font-family
In word processors, changing the shape of the letters to match a typographical set of characters is as easy as selecting from a menu. Your computer scans your operating system for the presence of fonts and lists all the ones it finds as options. A web browser has access to this same list of fonts, but the basic list differs from one computer platform to the next and can be expanded by installing extra fonts. To make a web browser try to use a specific font, the CSS must be coded to pass that request on to the client.
The font-family
property
allows the web developer to send a comma-separated list of font
names to the browser, which will use the first one that it
recognizes as available in its system. For this reason, any exotic
fonts should be either avoided or listed with other, more common
fonts that are likely to be available. The list of values should
end with a general family type—such as serif
, sans-serif
, script
, or monospace
—to direct the browser, as a
last resort, to pick the system- or user-defined preferred font
for that general category.
Example: * {
font-family:Helvetica,Arial,sans-serif; }
font-size
The size of the text is dictated by the font-size
property, which can accept
values that range from percentages to length units to keywords
(smaller
, larger
, xx-large
, etc.) that are interpreted by
the browser. The values are most commonly expressed in terms of
px
(pixels), pt
(points, the default), or em
. Em-length is often preferred, since
it is calculated from the current size of the text. A font-size
value of .9em
would cause the text to be rendered
at 90% of the size it would be without the new style definition,
and 2em
would cause the text to
be rendered twice as large.
Example: p { font-size:.9em;
}
color
, background-color
Colors
can be assigned to any object to describe both its
text (or border) and the background fill upon which that text
rests. There are a number of predefined color names: aqua
, black
, blue
, fuchsia
, gray
, green
, lime
, maroon
, navy
, olive
, orange
, purple
, red
, silver
, teal
, white
, and yellow
. The color can also be expressed
as a hexadecimal value (#00ffcc
) or its decimal equivalent
(0,255,204
) to indicate how
much red, green, and blue to mix to form the desired color.
The background-color
property also accepts a value of transparent
, which means whatever
colors, images, or text are underneath the layered object will be
visible. This is the default, which is why setting the background
color of the body will result in all elements having a background
of that color.
Example: #sidebar { color:green;
background-color:black; }
text-decoration
This property deals with some of the less common style
changes for text, such as strikethroughs, overlines, and
underlines. It’s useful if you want to remove an underline, such
as the one added to a hyperlink by default. text-decoration
can also cause things to
blink, which is probably a sign that we aren’t learning from past
mistakes.
Example: a { text-decoration:none;
}
Once you’ve defined your styles, you have to get the browser to locate your styling genius and use it to render your web pages. CSS can be used inline as part of the XHTML or referenced as a separate file outside of the web page.
For inline references, the style
attribute can be added to any tag and
filled with style properties and values, as in:
<div style="font-size:36pt;"></div>
Whatever styles are defined in this attribute will apply to the
content of the tag itself. Since embedded objects can inherit some or
all of these properties, the style may cascade down to structures
contained within the styled element. Changing the font size of the body,
for instance, will also change the size of the text in all of its
<div>
and <block>
tags, unless that setting is
overruled by another style.
Another way to bring styles into a web page is to define a whole
block of properties at once using the <style>
tag. This tag is typically used
between the <head>
tags that
are used to define the page. By placing several properties between the
style tags, you can define the look of an entire page in one
easy-to-edit location in the code.
A third way to assign styles to the elements of a web page is to bring in the defined properties as an external file. There are several advantages to doing so, not the least of which is that it makes the amount of code in your web page much smaller. External files can also be shared across all of your sites, giving you one place to edit a style whose properties you want to affect your entire site.
External files can be brought to the browser in one of two ways.
The first is the <link>
tag, which is placed between the <head>
tags of the HTML. The browser
will look for the file referenced in the href
attribute, as in:
<link rel="stylesheet" type="text/css" href="my_styles.css">
The second method is to leverage a CSS convention, @import
, between <style>
tags:
<style type="text/css" media="all">@import url(my_styles.css);</style>
The big advantage of this method is that older browsers can’t and won’t attempt to interpret this code; they will simply render the page based on the HTML they can see. This increases the chances of compatibility with older systems.
When the browser looks for the specified file, it will start in the directory containing the page it is trying to render. You can use relative paths (../css/my_styles.css) that climb up and down the directory structure, server paths (/css/my_styles.css) that start at the domain root, or full URLs to point to the file.
I can remember the early days of web design, when sites consisted of static content, hardcoded into a certain state by the developer/designer/editor/marketer. Small sites with three to five pages were tough enough to maintain. Once a website grew beyond about two dozen pages, time spent making frequent changes could eat up a hefty percentage of the workday.
Dynamic web pages used to be created on the fly using runtime programming languages such as Perl, primarily. There are more options today, including proprietary platforms such as ASP and ColdFusion. PHP—which stood for “Personal Home Page” when it was first released in 1995—is a widely used free server option that powers over 20 million websites, including popular tools such as Wikipedia, WordPress, and even Facebook.
PHP bears a lot of resemblance to Perl, largely because that was what PHP’s creator, Rasmus Lerdorf, meant it to replace. It has had five major revisions; the most recent (PHP 5) is now a half-decade old. Although PHP was intended for web page generation, it can also be used from the command line, making it amenable to automation with crontabs.
The language is much richer than what you will encounter in this book. My intention here is to explain only the part of PHP that will be used in the sample applications, which are intended to be bare-bones examples of some of the things you can do with the Twitter API.
Here you are, ready to set the world on fire with your creativity and build a useful tool for the Twitter community. Your application will require not only data from an API over which you have no control, but also user input typed into web forms and content feeds originating from remote sources. Before accepting such input, you should know about some of the risks in doing so.
Database servers are separate beasts from the rest of the web application. They have their own security and usually reside on a completely different machine. The data they currently contain and are willing to store is of little use if they can’t communicate with the web pages that are being rendered, but the necessary connection between the web application and the SQL database creates an opportunity for malicious users.
A SQL injection attack can happen if the data you pass to SQL statements has not been properly filtered to remove escape characters. If these special characters get passed to the database, they could be interpreted as a query. The injected code may be a hexadecimal string that looks like gibberish to you, but the SQL server may still be able to decode and interpret those characters. This kind of attack isn’t specific to PHP or MySQL; it can occur whenever one scripting language is embedded within another.
CRLF refers to the carriage return and line feed combination of ASCII characters often used in Windows applications to indicate a new line (for Unix-based systems, the end of line is denoted with a line feed only). The attack is characterized by a hacker injecting CRLF commands into the system. Since these characters are often used to parse data, extra end of lines can allow one entry to become several.
This kind of attack is also known as a “one-click” attack. It is made possible by exploiting ill-conceived browser cookie strategies to send commands to another site that relies only on the cookie for authentication. What makes CSRF so scary is that the links that send the commands can be included as the source of an image file. The act of loading a web page in a browser or viewing an email message in an HTML-enabled email client can be enough to kick-start the attack.
There are a few prerequisites for CSRF to work. First, the
attacker has to target a site that is known not to check the Referer
header. Second, that site must
contain a web form that does something useful for the hacker (like
transferring funds). The attacker must then simulate that form
perfectly, with all the required form names and value constraints, and
get the target to load up the malicious web page where the attack is
embedded while that person is still logged into the remote
site.
The most important concept to grasp is the cornerstone of programming, the variable. This is the entity into which you store data so you can manipulate it and create things such as search engines and blogs showing dogs sleeping upside down. All of this important stuff is dependent on our ability to name and retrieve information as part of the programming logic.
In PHP, strings, arrays, and objects are all referenced in the
same way: with a $
and the name of the variable. PHP keeps track of what
kind of variable it is based on the context of what you are trying to do
with it. A string contains a discrete value, which could be an integer
or a bunch of text. An array is a list of strings, referenced with a key
that can either be an integer or text. You will use objects whenever you
deal with files, databases, or XML structures.
When setting a string, the content is encapsulated in quotes.
Single quotes ('
) cause the content to be taken literally, without
interpretation. This means if you include another variable, the value of
that variable is not substituted. Double quotes ("
) cause PHP to attempt to interpret any variables contained
in the string and resolve them to the values they represent. Because of
this disparity, single-quoted strings will run faster. If you wish to
include a quotation mark of the same type that is encapsulating the
string, you have to escape it by adding a backslash () before it. This tells PHP to ignore
it:
$truth = "<div id="sarcasm">The Packers play 'football'</div>";
Sometimes short lines of text are not enough. You may want to
store a lot of text all at once and not have to deal with all the
concatenation and character escaping. Enter the
heredoc. With this string-creation tool, you
signify the start and the end of the text you want to save and just type
away between those points. The start point is in the form <<<
EOS
,
where EOS
refers to the sigil text PHP should
look for as a signal to stop storing content in the variable. That end
text cannot be indented. For example:
$truthier = <<<GO_BEARS <div id="fan"> <div id="diehard"> <h4>Chicago Bears</h4> <p>$this_person is a diehard fan of #{$player['jersey']}.</p> </div> </div> GO_BEARS;
With a heredoc, you can freely insert quotes and variables without
the need to escape them. The only special thing you have to consider is
when referencing arrays: array values must be wrapped in curly braces
({}
) to let PHP know to evaluate them
and use their stored values instead.
Arrays are only a little more complicated. You let PHP know that a
variable is meant to be an array by using the array()
function. If you do not include
data in the parentheses, the variable will simply be set or reset as an
empty array. Associative arrays can be created using the convention of
key
=>
value
pairs listed with commas as
delimiters:
$cURL_options = array( CURLOPT_USERAGENT => 'Twitter Up and Running - '.$app_title, CURLOPT_USERPWD => "$twitter_username:$twitter_password", CURLOPT_RETURNTRANSFER => 1 );
Each value is then referenced as $
array
[
$index
]
, where $index
can
be an integer key or, for associative arrays, a string, as in:
$UserAgent = $cURL_options['CURLOPT_USERAGENT'];
You can add a new indexed value to an array by leaving the index blank. This technique is used within loops as an easy way to build the array iteratively:
foreach ($cURL_options as $key => $value) { $option_keys[] = $key; $option_values[] = $value; }
Arrays can be nested. In this case, instead of storing a
string for a given key, an entire array is stored. If $twitter_data
were an array that contained the
individual arrays containing each Twitter member’s most recent status
update, a single tweet could be extracted as:
$my_tweet = $twitter_data[0]['tweet'];
PHP objects will come into play whenever you parse data from the Twitter API. The XML files that are returned are turned into objects that can be manipulated by referencing shared methods within them, as in:
$feed_title = $doc->getElementsByTagName('title')->item(0)->nodeValue;
In this case, the value being returned is whatever text was
wrapped within <title>
</title>
tags in the XML structure.
The text is extracted from the object and placed into the string
$feed_title
for later use.
Once you’ve captured some information in one of the variable forms, you may need to do something to it to make it useful. Like any good programming language, PHP has a number of functions that perform commonly required manipulations:[61]
array_search(
array_to_search
,
array_of_search_terms
)
Rather than building a loop to check every value in
your array, you can use the array_search()
function to look for
matches with a search string or multiple values stored in an
array. The function returns the key for the first match it
finds.
base64_encode(
string_to_encode
)
base64_decode(
string_to_decode
)
This function encodes data as base64, returning a
new string that is a bit (33%) larger than the original. It is
used to make data safe for transfer to destinations where some of
the characters may cause problems. The base64_decode()
function will
convert text that has previously been encoded with base64_encode()
back into its
original string.
basename(
path_to_file
)
This function takes a string containing the path to
a file and returns the base name of the file. For instance, if the
path is /path/to/file.php, basename()
will return
file.php.
bin2hex(
ascii_text
)
This function takes an ASCII string and returns the hexadecimal equivalent.
ceil(
value_to_round_up
)
The math function ceil()
turns a value into the next
highest integer, or the ceiling for the number.
date(
format_template
,
timestamp_to_format
)
The date()
function turns a numeric version of the time into a
formatted string. The template for that resulting string is
dictated by the letter codes used as placeholders for the various
parts of the date and time. For example, “l, F jS, Y G:i:s a”
would produce something like, “Thursday, December
25th, 2008 6:21:34 am.”
dechex(
hexadecimal_value
)
The dechex()
function converts a hexadecimal string into a number. The
input value can be no more than eight characters long. This is the
opposite of hexdec()
, which performs the
conversion the other way.
explode(
delimiter
,
string_to_break_into_na_array
)
The explode()
function converts a long delimited string into an array of
strings. You specify the character(s) to use as a delimiter, and
it splits the provided string into parts at every instance of that
delimiter. The parts then become separate indexed values of a new
array. The companion to this function is implode()
.
gmdate(
format_template
,
timestamp_to_format
)
This function works in the same way as date()
, turning a numeric timestamp
into a formatted string to represent the date and time. The
difference is that the time is expressed in Greenwich Mean Time
(GMT) rather than in the local server time zone.
hash(
method_of_encryption
,
string_to_encrypt
)
Hashing involves use of a specific algorithm, such as sha256, to encrypt a string. This is a one-way process used to store sensitive information, such as passwords, for comparison with user-entered information as a way of verifying access. There is no mathematical way to reverse-engineer this, although lookup lists can be generated to re-map back to the original string. It isn’t easy, especially if you salt the password by adding a random string to the password being stored.
Example: $secured =
$salt.hash('sha256',$salt.$password);
htmlentities(
string_to_encode
)
This function recognizes that some
characters—namely, the ampersand (&
), double quote ("
), single quote ('
), less-than symbol (<
), and greater-than symbol (>
)—have special meaning in HTML and
may cause text containing them to be rendered inappropriately. The
htmlentities()
function is
often used to prevent text submitted by the user from rendering
as HTML by substituting translations for these characters. It is similar to
htmlspecialchars()
but
substitutes more comprehensively.
You can reverse the translation by using html_entity_decode()
.
implode(
delimiter
,
array_to_make_a_string
)
This is an important function that bridges the gap
between strings and arrays. You can use implode()
to turn an array into a
single string by specifying a delimiter character(s) that will
connect each value in the array. This is the reverse of the
process performed by the explode()
function.
in_array(
term_to_find
,
array_in_which_to_search
)
Like the array_search()
function, this function is a way to search through the
values stored in an array for keyword matches. The in_array()
function simply confirms
with a Boolean whether or not a match is found.
is_int(
value_to_test
)
This function tests to see whether a given variable value is an integer, returning a Boolean as a result.
mt_rand()
The mt_rand()
function is one of many random number generators that aren’t
truly random, but that generate pseudorandom values that work well
enough to make programs seem unpredictable. This one is a bit
faster than the rand()
function, which is why it is preferred.
number_format(
number_to_format
,
decimal_places
)
This versatile function accepts up to four parameters to return a string that adds formatting to a numeric value.
pack(
format_template
,
arguments_to_convert
...)
The pack()
function uses formatting codes inherited from Perl to turn a
list of arguments into a binary string. The binary string can be
turned back into an array of arguments with unpack()
.
preg_replace(
pattern_to_find
,
replacement_value
,
string_to_search
)
This is a search-and-replace function that uses pattern matching to locate the places to insert replacement text into a string.
Example: $x =
preg_replace('/&(?!w+;)/', '&', $x);
reset(
array
)
When arrays are navigated, the internal pointer is
incremented to reference items with higher and higher indices. The
reset()
function returns
that pointer to the first element in the array.
sizeof(
array
)
This function counts the number of items in an
array. count()
is an alias
for sizeof()
.
stripslashes(
quoted_string
)
For most PHP configurations, quotation marks within the
user-submitted text are automatically escaped to allow for
operations such as saving to a database, where they could mess
things up. There are times, though, when that is overkill. The
stripslashes()
function will remove those escaped characters.
str_pad(
string_to_pad
,
minimum_length
,
pad_text
,
instructions
)
Back in the day when all formatting was done with
monospaced characters, lining up numbers and text in columns
required some padding spaces or other characters. The str_pad()
function makes that easy
by asking for a minimum length for the resulting string and the
text you want to use to make a shorter string that target size. By
default, the padded characters will be added to the right of the
initial string, but you can specify instructions to pad to the
left side (for numbers, typically) or on both sides (to center the
string).
The sample applications make use of str_pad()
in password encryption to
make sure that the salt strings are all the same size.
str_replace(
text_to_find
,
replacement_value
,
string_to_search
)
This function works like preg_replace()
except it is more
straightforward, ditching regular expression matching with a
simple string to match. It returns an array with all of the
occurrences where the value was replaced.
str_rot13(
string_to_shift
)
This encoding function simply shifts every letter it
finds in a string by 13 places in the alphabet. Any numbers or
other nonalphabetic characters are ignored. Decoding is
accomplished simply by running the result back through the same
str_rot13()
function.
strlen(
string
)
Just as sizeof()
returns the number of items
in an array, strlen()
returns the number of characters in a string.
strnatcmp(
first_string
,
second_string
)
Whereas strcmp()
treats numbers as text, the strnatcmp()
function is able to order strings based on “natural
ordering,” or the way a human would read and interpret a string.
That is, instead of ordering a list “1,10,2,20...”, this function
would correctly interpret the order as “1,2,10,20...”. It returns
an integer value between −1 and 1 to indicate whether or not the
first string should be ordered before the second. It is more often
used with usort()
to help
order entire arrays of data.
strpos(
string_to_search
,
text_to_find
)
Like array_search()
does for arrays, the
strpos()
function returns
the position of the first match of the given text found within the
specified string. Since it returns the Boolean false if the text
is not found in the string, this function is also useful as a
conditional to confirm whether a match exists.
strrev(
string_to_reverse
)
This function simply reverses the character order of a given
string. The result can be decoded by running it through the same
strrev()
function again.
strtolower(
string_to_convert
)
When case is not important (as is typical of usernames), it
is good practice to turn strings into lowercase for storage and
comparison. The strtolower()
function changes any capital letters into lowercase.
strtotime(
datetime_as_text
)
There are many different ways of conveying a date
and time. Many of these are variations of the numbers and text we
commonly use to report time in a readable format, such as “January
30 2000 11:13:00 GMT.” For PHP to be able to do any calculation on
that moment in history, the text has to be converted to its
numeric counterpart. This is performed with the function strtotime()
. If no string is
provided, the current Unix timestamp will be returned.
substr(
string_to_shorten
,
starting_position
,
length_of_substring
)
To extract a part of a larger string, use the substr()
function to specify the starting point within that string and
optionally the length of the new string. If the starting position
is negative, the extraction will be done from the end of the
original string, rather than the beginning.
trim(
string_to_trim
)
Sometimes data is entered or recalled with leading
or trailing whitespace. This can include ordinary spaces as well
as tabs, returns, and newlines. The trim()
function will examine both
ends of a string and clip off any whitespace it finds
there.
urlencode(
string_to_pass_as_URL
)
urldecode(
string_passed_in_URL
)
The urlencode()
function adjusts a string of text to turn any special
characters to a URL (such as an ampersand) into characters safe
for transfer in a URL query string. Some are converted to a
different character (for example, a space is converted to a plus
sign, +
), whereas most become
hexadecimal codes tagged with a %
symbol, as in %7E
.
This adjustment should be made before any text is passed to
a form as a URL query string. The urldecode()
function will return a
string that restores all %##
translations back to their original characters.
usort(
array_to_sort
,
name_of_sorting_function
)
Arrays can be sorted in very complex ways. The usort()
function allows you to reference another function for
comparing two values and use it to order the contents of a given
array. The function is typically one of custom design and must
provide the same kind of response as an existing comparison
function, such as strnatcmp()
. That is, the comparison
between two values should return an integer between −1 and 1 to
indicate which of the two values should precede the other.
utf8_encode(
string_to_encode
)
The utf8_encode()
function encodes a string to UTF-8, a Unicode standard that
deals with wide character values.
This is not a complete list of all that PHP can do, of course, but it does represent all you need to know to follow along with the programming done in this book.
For very thorough documentation on PHP’s hundreds of functions, visit http://www.php.net.
There are other ways for users to communicate with your PHP program besides explicitly filling out a web form. Every time a visitor clicks on a link on your website, she makes a formal request to the server for content. That action automatically creates a slew of special predefined variables that are available to scripts to let them know something about the context of the request.
One of the more useful of these variables is $_SERVER
, an associative array that contains a variety of
information about the paths and script locations. The $_SERVER
values are filled in by the web
server, so if the script is not requested over the Web, this array will
be empty (as is the case with automated tasks configured as
cron jobs).
The server does fill in these values, but many of them are based on user input (the HTTP request). As such, even some environmental variables can be used for an attack if not properly sanitized.
Even if the script is requested over the Web, not every variable
will be filled with information. For example, HTTP_REFERER
is supposed to contain the
address of the page that the user was on when he clicked a link to your
web page. However, if that person loaded your web page from a bookmark,
or if the ISP host through which he accesses the Internet doesn’t
provide that address information, then the value for $_SERVER['HTTP_REFERER']
will be empty.
Although more information about the server request is made available by PHP, these are the variables that are used in the sample applications for this book:
__FILE__
Whereas a $_SERVER
variable will reflect the file that was requested via the Web,
__FILE__
always refers to the
full path and filename of the script in which it is invoked. That
means an included file will have a different __FILE__
value than the main script that
called it.
__FILE__
is referenced
using two underscore characters on each
side of “FILE.”
$_SERVER['DOCUMENT_ROOT']
The document root is the directory path on the server from the top of the configured file structure to the directory root under which the current script is running.
$_SERVER['HTTP_HOST']
The host usually contains the domain information part of the
URI request. SERVER_NAME
likely
holds the same value.
$_SERVER['QUERY_STRING']
If there is a query string—the part of a URI that comes
after the filename, following a question mark (?
)—that entire part of the request
string will be stored in this variable. This information is likely
also pre-parsed into name and value pairs in the $_GET
associative array.
$_SERVER['REQUEST_METHOD']
Several kinds of HTTP request methods are possible, but most web pages deal with just two: GET and POST. This server variable stores the name of the method used to access the script and can be useful in determining where to find any user-provided input parameters.
$_SERVER['REQUEST_URI']
This variable contains the URI that was used to trigger the
web page script, once the request arrived on the server. This does
not include the protocol and domain, only the server path after
the web root (as in /index.php). The REQUEST_URI
will also include the query
string, if one exists.
$_SERVER['SCRIPT_NAME']
The SCRIPT_NAME
is the
current script’s path. This is often the same value contained in
REQUEST_URI
, but without a
query string, if one exists. Unlike __FILE__
, this variable will reflect the
path to the requested file, not the one invoking the
variable.
Web forms and links are usually submitted to the server
using either the GET or POST HTTP method. These methods encapsulate the
data in different ways, and PHP then parses it to fill special
associative arrays, $_GET
and
$_POST
. When processing a web form or
accepting query string parameters as part of a request, your program can
reference these arrays to check for the presence of user input.
If you accept the same form variables using either method, you
will have to determine which method takes precedence. For instance, if
posted data is preferred, that means your program only has to check
$_GET[
'variable_name'
]
if $_POST[
'variable_name'
]
is missing.
Programming is about more than storage and calculation. It is also about the flow of logic. PHP offers the usual assortment of flow controllers to help your application make decisions about what to do.
The if
statement acts like a decision tree, checking to see whether a
given expression is true before executing a particular part of the
program. The expression can be very complicated or involve multiple
comparisons, but in the end it is either a true statement or a false
one. If the expression is false, the statements between the curly
braces are ignored.
The if
statement can be
extended using elseif
and else
. The former performs another check on a
different expression, proceeding with that section of code if the
expression is true and moving on without processing if it’s false.
Multiple elseif
statements can be
strung together in sequence, but each will be evaluated only if all of
the expressions that preceded it proved false. An else
statement is the last one in the flow
structure and serves as a catchall. It does not evaluate an
expression; if all of the previous expressions fail, the statements
within the else
brackets are
executed by default.
if ($a == 4) { $x = 1; } elseif ($b == 0) { $x = 2; } elseif ($c == $b) { $x = 3; } else { $x = 0; }
A while
loop will continue
indefinitely, for as long as the expression is true. The moment its
state becomes false, the while
block will end and execution will continue with the next line of code.
The expression is reexamined with each iteration.
while ($row = mysql_fetch_assoc($sql_result)) { $rss_feed_stored = $row['rss_url']; }
Be careful about the expressions you use in a while
loop. If you evaluate a condition
that never changes, the loop will never exit.
A for
loop will
execute the statements in its block repeatedly until the result of its
expression reaches a maximum value. The initial state, the expression,
and the increment for each iteration are all defined in the initial
loop.
The first part of the for
expression is evaluated once, at the very beginning of the first loop,
to set the initial conditions. The second and third parts of the
expression are then evaluated at the start and end of each iteration,
until the second expression proves false. The three parts are
separated by semicolons (;
).
for ($i=1; $i <= $max_value; $i += 1) { echo $i . ' of ' . $max_value; }
The foreach
loop is
an easy way to iterate over the contents of an array (or, as of PHP 5,
objects). At the start of each iteration, the next value in the array
is assigned to a variable that can then be referenced in the
statements contained in the foreach
block. For associative arrays, a special form allows the key and value
pair to be assigned as separate variables, so both are readily
available during execution.
foreach ($array_of_arrays as $this_array) { foreach ($this_array as $key => $value) { echo $key . ' is filled with ' . $value; } }
Because some variables may have many different values
that trigger distinct responses—this is true for codes for status
messages returned to the user as a response to submitting a form—the
if...elseif
construct can grow
unwieldy. The switch
statement
simplifies the code needed in that situation by allowing you to use
one variable expression with many possible values. Each value, or
case
, contains its own set of
statements to process. There is also a default
that is executed if the variable
value does not find a match.
Unlike in other loops, the switch
block executes each case line by
line, regardless of the value; it does not stop and ignore all
remaining cases automatically when it finds a value match. After a
false value comparison, it simply skips to the next case, ignoring the
statement in between. When it finds a match, PHP starts executing the
next line of code. This will continue until the end of the switch
block unless PHP encounters a
break
statement to tell it to stop.
If you forget to include a break
at
the end of the statements for a given case, everything in the entire
switch
block will get evaluated,
including the default
.
switch ($root) { case 'user'; $thisFeed = $xml->id; break; case 'status'; $thisFeed = $xml->user->id; break; default; break; }
A break
can be used
elsewhere in your code to stop the execution of a loop. It allows
PHP to escape the current for
,
foreach
, and while
structures, too. If they are nested,
it will escape only the current block.
PHP 5 introduced exception handlers that “catch” errors
that the server “throws” when it tries to run the part of the code
contained in the try
block. Every
try
must have at least one catch
block; it can also have more than one
to handle each exception differently. If no error is detected, or if
there is no catch
block configured
to catch the error, the rest of the code is executed as normal.
If an exception is caught, any statements in the catch
block will be executed. This code may
or may not be programmed to terminate the program. The purpose of
exception handling is to either exit gracefully or allow the program
to continue despite the error.
try { # some statements that cause PHP to choke } catch (Exception $e) { $form_error = 15; }
There are some cases where errors encountered by PHP
should simply stop execution of the script. Both die()
and exit()
act in the same capacity, killing
the running script where it stands and outputting the error message
provided as a parameter:
$filehandle = fopen($file,$type) or die("can't open file $file");
The exit()
function can
also be useful when debugging, allowing you to temporarily insert the
statement before some problem code can be executed to allow you to
check the state of the program at that moment.
Files that store and later retrieve data are staples of web applications. They are easier to use than databases, since they don’t require authentication (although they may require file permissions to be granted to the directory in which they reside). Files can also be used to create static versions of a website through automated tasks to cut down on overhead when databases are required.
In our sample applications, file management functions are used to create log files that report on activity by automated tasks. Here are the methods you’ll use:
fopen(
file_name
,
mode_for_opening
)
To read from or write to a file, PHP must be able to
reference it in some way in the code. The fopen()
function creates a file handle that binds to a stream
connecting PHP to the contents of the specified file. For this to
be successful, PHP must have access to that file, meaning that it
not only must be reachable but also must be configured with the
appropriate permissions.
Windows servers reference the path to the file differently than Linux servers do: any backslashes () must be escaped, as in c:\my_documents\my_file.txt. Alternatively, you can use forward slashes and avoid the problem.
There are a few different ways PHP can open the file.
fopen()
asks you to declare
the mode (a code that dictates whether you can read from and/or
write to the file), where the pointer should be located when it
opens the file, and what to do if the file already exists. For
logging—where you only need to add a line of text to whatever is
already in the file, and worry about reading the contents at a
later time—the mode "a"
, for
append, is sufficient.
fwrite(
file_handle
,
text_to_write
)
Once a file is open in a mode that permits writing, the
fwrite()
function can be used to enter a string of text into that
file. The file is referenced through a file handle, not by the
path and name of the file itself. Prior use of fopen()
is required to create that
association.
Be aware that different operating systems have different
conventions to determine line endings. Unless you are OK with
your text running together in one long, wrapped line, using the
correct convention is important. For Linux, the newline
(
) is sufficient. Macintosh
looks for a carriage return (
), and Windows machines require both (
).
fclose(
file_handle
)
Assuming the file handle pointing to an existing file is
valid, fclose()
terminates the association between PHP and the file
stream.
unlink(
file_name
)
This function deletes the file specified in the parameter, provided it exists and permissions on the server allow it.
file_get_contents(
uri_to_retrieve
)
If you want to create a virtual browser and retrieve content
available on the Web, the file_get_contents()
function can help. Enter an encoded URI, and a string will
be returned containing the contents of that file, as rendered by
the web server. If the retrieval fails, the function will return a
Boolean false.
This function can also point to a local file, acting as
three functions in one by opening, reading, and closing the file
connection. file_get_contents()
is used in the
sample applications in this book to create a TinyURL that can be
included in a direct message to a Twitter member.
file_get_contents()
won’t work if the configuration setting allow_url_fopen
is disabled.
Teaming the dynamic programming of PHP with the power of SQL query statements can make for some potent applications. To make this hookup, however, you need to make use of a special group of functions to access and interact with a MySQL database.
With PHP 5, an improved version of this group of old
MySQL functions was added. The mysqli
extension (the “i” is for “improved”)
has a procedural interface and an object-oriented interface. Using
this extension has several speed and security benefits, and it is
recommended that you upgrade your code to take advantage of it.
For more information on converting from mysql
to mysqli
, see http://forge.mysql.com/wiki/Converting_to_MySQLi, as
well as the main documentation at http://us3.php.net/mysqli. There are some configuration
changes that may be needed for PHP to be able to use the new
functions.
mysql_connect(
database_host
,
username
,
password
)
This function initiates the connection between PHP and the
MySQL database server by specifying the server location with
access information. The function mysql_connect()
returns a link to the database upon success.
mysql_select_db(
database_name
)
A database server can host multiple databases. For your queries to retrieve the data you want, you have to specify one of the available databases on the server. The database name passed to this function will become the current active database associated with the open database link. All communication will be with that database.
The link to the database connection can be specified, but if
it isn’t, this function will use the link last opened by mysql_connect()
. If no connection
has been made previously, it will try to make a new connection
without any parameters.
mysql_query(
sql_query_statement
)
The mysql_query()
function passes a single SQL statement to the current active
database through a valid link to the database server. For queries
that are meant to fetch data, the function returns a result set.
For other queries that are intended to perform some action (like
an INSERT
or DELETE
statement), a Boolean is returned
to indicate success or
failure.
When you are using the command line or files to run SQL statements, a semicolon is required to let MySQL know when to stop reading the query. This function deals with that internally, and therefore SQL statements passed as parameters should not include semicolons.
mysql_affected_rows()
For queries where no records are expected, such as
DELETE
statements, this
function returns the number of rows that were affected. If the
query fails, a negative value is returned.
mysql_num_rows()
For queries where records are expected, this function returns the number of records in the result set.
mysql_fetch_array(
sql_result_set
)
Generally, you’ll connect to a database to get data from it.
The result set stored in the numerical array returned by mysql_query()
can be parsed using
mysql_fetch_array()
, looping each row of data in a foreach
block. This function
automatically moves the array pointer to the next index when
called, returning a Boolean false when there is no more
data.
mysql_real_escape_string(
string_to_escape
)
As with URIs, there are characters that have special
meaning for MySQL and thus should not be included in the queries
you submit. Failing to screen for these characters may lead to trouble,
ranging from merely causing the query to return an error all the way to allowing
malicious activities with the database. The mysql_real_escape_string()
function
examines MySQL’s own library of special characters and replaces
them with safer versions. Because of the potential for disaster
and the ease of its use, there is no reason not to escape all text
strings before sending them to the database.
mysql_free_result(
result_from_last_query
)
This function clears the results of the last query, which has two advantages. First, it lets the server know that the memory used to store the previous data is now available for other things. Second, it eliminates the chance that you may accidentally reuse the same result set.
Technically, the mysql_free_result()
function only
needs to be called when memory consumption is an issue,
typically for very large result sets. For smaller data, using
this function can result in higher memory use than not. However,
the by-product of its use is a clean break from data that is no
longer needed to complete the script.
mysql_close(
database_connection_handle
)
Assuming the handle representing the database link is valid, this function disconnects PHP from the database server. The link to the database will automatically end when the script finishes running, but it is good practice to close it explicitly.
In any given program, you may decide to write to files or interact with a database. You may do a particular kind of sort on an array, for example. If you only have one program, where you put the code to do these things won’t make that much difference. However, if you have several pages that all do the same kinds of things, being able to reuse your code becomes exponentially more important.
Imagine writing a program to build a simple web page with a fancy header menu. Your strategy is to build the page, then duplicate the code for 11 other similar pages, making minor adjustments to the visible content. That may be the quickest and most efficient way to build the pages, but what happens when you want to make changes to the fancy header menu? Instead of updating it in one place, you need to make the changes 12 different times!
Making use of includes and custom functions is one way to make your code easier to maintain and simultaneously cut down on the number of lines of code on any given application page. The sample applications in this book use included files with custom functions to do just that.
Bringing additional code into the scope of your PHP
script is easy with the include()
statement. This function looks
in the specified path for the desired file and evaluates its content
as if it had been typed into the original script:
include $root_path.'environment.php';
The period (.
) between the $root_path
variable and the rest of the text is a concatenation operator.
It connects two or more strings together to form one long
string.
Debugging with includes used to be a bit problematic, since problems parsing the included code didn’t stop the calling script from running. Now, though, PHP makes sure all of the code it sees is syntactically correct before executing anything.
To do its thing, the include()
statement either needs to know
the exact path where the file is located or needs to be able to find
the file among the locations added to the include_path
list. If it can’t find it, the
script will issue a warning but will continue to run until the absence
of that code proves fatal. If you don’t want that behavior, there is
another command—require()
—that
works just like include()
but
kills the application the moment it can’t find an included
file.
Included files don’t have to live in the web path, which is another reason to use them. If you have access to your server account’s document root, you can move all of your included files into a path that other people can’t see from the Web. This eliminates the chance that someone will call that script by accident or intentionally try to make use of it. It also allows you to store hardcoded access information—such as the username and password you are going to use to get into your database—without exposing that sensitive information directly to the Internet.
Once you have a separate file attached to your PHP script, you have to figure out what to put in it. It is certainly possible to simply add a bunch of code that does something like setting commonly used variables or opening a log file. The script will treat that included code as if it had been written into the calling file, so the application will work. However, where you include the file will be meaningful. If you need to manipulate some variable data and you place that code in an included file, you have to make sure those variables are filled with what you need before you include the file.
For this reason (and others), it is good practice to contain
your external code in functions that can be called when needed in the
original script rather than run at a specific point when the file is
included. If you do this, you can group the include()
statements at the top of the
script where it is easy to see what you need, and you can define
multiple functions in an external file that can be used anywhere after
that point in the calling script.
I make sense of functions use a shopping metaphor. My includes directory, where I put the external application files, is the big mall. Each file is a wing or level of that mall, and any functions defined in it are the stores located in that wing. Stores tend to have their own structures and purposes once you cross their thresholds, and you can go back to them whenever you need the particular goods they carry.
To construct your function-store, you need to first create that
threshold. You can accomplish this by creating a new block (between
curly braces) that declares the function, provides a name for it, and
defines any parameters it will accept. Example 3-5 shows a simple
function is called name_of_function
that takes up to two parameters. The first parameter is required—PHP
will choke if at least one parameter isn’t included—but the second,
because I’m assigning it an empty string as the default value, is
optional.
function name_of_function ($parameter1,$parameter2='') { $foo = $parameter1 + 1; $bar = $foo . ' was required ' . $parameter2; return array($foo,$bar); }
Within the block of code—the inside of the function-store—are
the interesting goods. For the function created in Example 3-5, we want to
increment the first parameter and format a little message based on the
new value and the optional text that we can pass to the function. The
last statement is where we use the return()
construct to deliver the
goods—in this case, an array with the two manipulated values. In
essence, this is the part where the stuff you buy gets put in a bag
for you to carry home.
PHP differentiates between constructs and functions, even though both look the same in the
documentation (with parentheses added after their names). include()
is a construct that does not
require the parentheses to run. function_exists()
is a function with a
parameter and a need for parentheses to evaluate what is passed to
it.
Constructs do not appear in the list of known functions when
using function_exists()
to
find out if code you want to use is there.
Custom functions are invoked in the same way that built-in PHP
functions are called. How you code them depends on what kind of
information is being returned. If it will return a Boolean (true
or false
, 1 or 0) or nothing at all, the
function does not usually need to be assigned to a variable. This is
common for functions that are used as expressions in loops. However,
if the function will return a value, you’ll want to capture it in a
variable when you invoke it. When multiple values are returned—as is
the case in Example 3-5—the function needs
to send them back contained in an array, which is then received in the
original script using the list()
function. Here are some examples
from the sample applications:
getHeader($app_title,'css/uar.css'), $scrambled_password = scramblePassword($twitter_password); list($thisFeed,$foo) = parseFeed($rss_feed);
Remember that the variables are versatile: they can be numbers, strings, list
arrays, associative arrays, or even objects. Because PHP is a “loosely
typed” language, you may need to use the is_
type
functions
(is_int()
, is_string()
, etc.) to find out what kind
of data you have. The array-to-list transfer is only needed if you
need to send back different kinds of variables (a string and an array,
for instance) or if you want to be able to parse the resulting data on
the function side. In that case, the function can simply fill an array
with the manipulated values and return that instead.
Because your code will be assembled on the fly and is intended
to be reused in many ways by different applications, there is a
built-in function that may add some grace to it. The function_exists()
function checks a list
of all defined functions, both those built into PHP and those defined
by the programmer, and returns true
if a function with the specified name is detected:
if (function_exists('apiRequest')) { $data = apiRequest($url); }
This is useful as a way to make sure the proper code is available and, if not, to allow the program to handle the problem gracefully, either by reporting it to an administrator or by simply exiting with a web page response the user can read. The alternative is an error that could kill the program and reveal some of your code in an ugly message.
When it comes to sharing data, XML is the prevalent way to format information into a structure that is easily parsed. Your PHP application will need to parse XML at some point if you get data from APIs, including the API Twitter provides. Before PHP 5, you had to build your own parser, looking in the string for patterns to divide up the formatted data in a meaningful way. Now, however, PHP comes with an extension called SimpleXML that does the heavy lifting for you.
The SimpleXML extension turns XML structures into embedded objects
containing the data as associative arrays. Each embedded tag in the XML
becomes another link in the PHP object, as in $xml->childnode->node['attribute']
. The
sample applications use SimpleXML to parse the data received from the
Twitter API and to create new XML documents with the data the
applications collect. Here are the methods you’ll see:
SimpleXMLElement(
well_formed_xml
)
When you want to start building XML from scratch, SimpleXMLElement
can help. It turns a well-formed XML string or a
path to a file containing XML data into an object that can be
iterated, edited, and expanded with a variety of methods. The
object is a collection of tag names associated with the content
the tags contain and the attributes they are assigned. Nested tags
show up as objects that themselves can be parsed into tag names,
attributes, and values.
Example: $xml = new
SimpleXMLElement($base_xml);
simplexml_load_string(
well_formed_xml_string
)
For simply parsing XML data that already exists, this function takes the data in the form of a string and returns a navigable object.
Example: $xml =
simplexml_load_string($data);
getName()
The getName()
method returns the name of the root tag for a particular
XML object. When the XML object is first created, this will be the
main root of the document, but this method can be used for nested
objects as well.
Example: $root =
$xml->getName();
children()
The children()
method creates an iterative array of objects representing
the nodes directly below the XML object calling it. Each child
object can then be explored for a name, value, attributes, or any
other XML objects it contains.
Example: foreach($xml->children() as $x) {$ids[] =
$x->id;}
addChild(
name_of_new_node
,
value_stored_in_new_node
)
This method is what allows you to create your own XML documents. It accepts a text value to describe the name of the new nested tag and any value you want it to contain. It returns the new XML object, which can then be edited and expanded as if it had been part of the original XML.
Example: $xml->channel->addChild('newchild','my
text'),
asXML()
All of the manipulation of addChild()
only changes the XML
object that PHP has stored in memory. To make the changes real,
you need to turn the object back into a string of well-formed XML
text. This can be printed or stored in a string for later use in
the application. asXML()
will also accept a filename parameter and write the file directly
to a document.
Example: echo
$xml->asXML();
SimpleXML is great, but there are other ways to extract data from an XML document. The Document Object Model, or DOM, is a standard object model to describe markup and make it able to be manipulated by other applications. It is a required part of JavaScript, to give the scripting language the ability to access and change web pages on the fly. Some web browsers also use DOM to render web pages from HTML. DOM is particularly useful for accessing the markup out of order, navigating back and forth in the nested nodes, or jumping directly to different parts of the document.
The W3C DOM has three parts. The Core describes any structured document. There are also specific object descriptions for both XML and HTML. As of PHP 5, the DOM extension has been added to the PHP arsenal. The sample applications use DOM to parse RSS feeds. The following are the methods you’ll encounter.
DOMDocument()
As with the SimpleXMLElement
object, a new DOMDocument
object can be created using
this method to hold the XML data you want to explore. This gives
PHP a framework for loading and parsing the structured
data.
Example: $doc = new
DOMDocument();
load(
url_to_well_formed_XML
)
To fill the new DOM object with XML data, you have
to let PHP know where to look for that data. For RSS feeds, the
simplest technique is just to point to the URI for the data you
are trying to parse. This path can also point to local files,
returning a Boolean true
or
false
to indicate success. The object must
already exist before it can be filled.
Example: $doc->load($url);
getElementsByTagName(
name_of_node
)
The DOM can retrieve a specific node from the full
document, returning it as an object containing all of the matching
objects. The getElementsByTagName()
function
creates a new instance of the node class, giving it access to more
tools to explore the node in detail.
Example: $items =
$doc->getElementsByTagName('item'),
nodeValue()
At the node level, the value a particular node
contains within its start and end tags can be retrieved with
nodeValue()
. The value can
be a date, text, number, or other data type, depending on the kind
of node being explored.
Example: $title =
$item->item(0)->nodeValue;
getAttribute()
Another tool at the node level is getAttribute()
, which returns the
value associated with a tag attribute, such as the link stored in
the href
of an anchor tag. If
an attribute is not found, an empty string is returned. For
singleton tags, the attributes are where the node data is
located.
Example: $link =
$node->item(0)->getAttribute('href'),
PHP includes a library for retrieving URLs called cURL. This free software can handle a wide range of data transfers and HTTP methods. PHP includes a few tools to allow your code to access this important functionality.
The PHP developer community typically shares code to make coding easier for everyone. One way this is done is by providing classes that can be used to extend your own installation of PHP and make the code a bit easier. You can find some cURL classes to replace the functions discussed here at http://www.phpclasses.org/searchtag/curl/by/package/tag/curl/.
Although cURL can do much more, the sample applications use the following cURL functions whenever they need to interact with the Twitter API:
curl_init()
To use cURL to access a remote file, you must first
create a handle that can be used to reference the connection later
in the program. curl_init()
can be initialized as an empty shell and filled later, either by
setting the CURLOPT_URL
option
or by passing a URL as a parameter. The new handle is returned
upon success.
Example: $cURL =
curl_init();
curl_setopt(
cURL_handle
,
name_of_option
,
option_value
)
The curl_setopt()
function allows you to configure an initialized cURL handle
by setting some of its many options. The options are named with
the CURLOPT_
prefix and include
the following, which are used in the sample applications for this
book:
CURLOPT_HTTPGET
If set to true
, this option causes the cURL
handle to use the GET method for the next HTTP request. The
Twitter API methods where data is queried and received all
use GET. This is the default for a new cURL handle.
CURLOPT_POST
If set to true
, this option causes the cURL
handle to use the POST method for the next HTTP request. All
of the Twitter API methods involving data changes use
POST.
CURLOPT_POSTFIELDS
When the server application on the other end of the request requires an HTTP POST request, the fields containing your data must be passed as an encoded string using this option.
If you are experiencing problems using CURLOPT_POSTFIELDS
, it may be
because of the default headers being passed by cURL. The
Expect
header may have
a value like “100-continue header,” which tells the server
to post your data only if it responds with a status code
of 100, or Continue. To get around this, you can simply
clear the value in the Expect
header to have cURL post
without waiting:
curl_setopt($cURL, CURLOPT_HTTPHEADER, array('Expect:'));
CURLOPT_RETURNTRANSFER
When executing the configured HTTP request,
this option should be set to true
to prevent cURL from
outputting the content it gets. You want to fill strings to
parse.
CURLOPT_USERPWD
Twitter’s API requires authentication for many
of its interactions with member data. The username and
password for the accessing account can be appended to the
cURL request using this option and a value in the format
[
username
]:[
password
]
.
CURLOPT_URL
This is perhaps the most important option—it tells cURL where to go to get the goods (i.e., which URL to fetch). It can be set when the handle is initialized as well.
CURLOPT_USERAGENT
When using someone’s API, it is courteous to
let the developers know who you are. Setting the User-Agent
header in the HTTP
request to something meaningful using this option will
accomplish this.
The curl_setopt()
function returns a Boolean to indicate success.
Example: curl_setopt($cURL,
CURLOPT_URL, $curl_url);
curl_setopt_array(
cURL_handle
,
array_of_options
)
When you have to set a number of options at once, it
may be easier to first create an associative array filled with the
option names and values to send to cURL all at once. curl_setopt_array()
sets multiple
options for a cURL session. Specify the active cURL handle and the
array of options, and the function will return a Boolean to
indicate success. If any single option fails, the remainder of the
array will be ignored and the function will return false
.
Example: curl_setopt_array($cURL,
$cURL_options)
curl_exec(
cURL_handle
)
The meat of this suite of functions is curl_exec()
, which actually executes
the session as it is currently configured. If the CURLOPT_RETURNTRANSFER
option has not
been set to true
, this function
will output whatever it finds at the other end of its request and
return a Boolean to indicate success. Otherwise, curl_exec()
returns the contents to
save in a string.
Example: $twitter_data =
curl_exec($cURL);
curl_getinfo(
cURL_handle
,
information_name
)
Each request has a meta-information channel that describes
the session that was just executed. curl_getinfo()
gets that information and either returns it as an
associative array containing all the available data or, if you
include a specific variable name, returns just that variable’s
value. The meta-information includes url
, content_type
, http_code
, filetime
, total_time
, size_download
, speed_download
, and download_content_length
.
Example: $status =
curl_getinfo($cURL, CURLINFO_HTTP_CODE);
curl_close(
cURL_handle
)
When you’re finished, use curl_close()
to end the session and free up all of the
application resources that were devoted to the HTTP
request.
Example: curl_close($cURL);
If your server does not support cURL, there are
other ways to get remote content. The file_get_contents()
function
explained in the section File Management of
this chapter is an option, provided PHP has been configured to
turn on allow_url_fopen
. A
more complex option is fsockopen()
, which should work
everywhere.
Let’s face it: programming involves a lot of trial and error. The first draft of an application rarely works perfectly, and the more code it contains, the more potential there is for problems that will need to be investigated and fixed. The process of debugging an application is greatly helped by good documentation and the ability to see what is happening at the points where problems are occurring.
Programming languages, including PHP, include ways for you to annotate your code without it affecting the processing that is done. Comments do come with a little extra overhead since there is more text to deal with, but the benefits greatly outweigh the trivial cost of including them. PHP comments work in a similar way to those in PHP or Perl—text that follows a special marker is interpreted as nonexecutable content and ignored when the program runs.
Comments are meant to serve as reminders for the programmer and to communicate to other programmers what is happening at various points in the code. Unlike with HTML, where anyone can look at the source behind the rendered page, the only people who will see PHP comments are programmers with an interest in and access to the script files.
There are two kinds of PHP comments. The first type is the
single-line comment. By prefacing text with a double-slash character
combination (//
), you signal that the text between the comment character
and the end of the line should be ignored during execution. For example:
// This is good for line-by-line annotation
The hash character (#
) will
work, too.
I’ve always preferred hashes, because there is less to type and because they stand out visually in a way the slashes do not. However, most modern text editors with PHP libraries will be able to understand what comments are and display them in a different color (typically light gray) to separate them from the rest of the code.
PHP also supports a way to comment out multiple lines of text: it
looks for the character combination /*
to indicate the start of a comment block
and the combination */
to signal the
end of the block. Anything PHP encounters between these two markers,
including line breaks, is considered a comment and is ignored during
execution. For example:
/* * Use Comments * * Liberal use of comments can only help a programmer make * sense of code, particularly as time passes and the * reasons for using a particular function fade from memory. * Except for the first and last lines to open and close the * comment, the asterisks on the left are optional and are * included merely for aesthetic purposes. */
Comments help a programmer remember or understand what a section of code does and maybe why certain programming decisions were made, but they don’t do anything to help fix problems in the code when they arise. To do that, you need a way to peer into the inner workings of the application, from the server’s perspective. Outputting variable values at different points in the program is a simple technique for debugging in PHP, to see what is changing and where.
The echo()
construct is one way to output data: when it’s executed, the
variables that come after it are evaluated and displayed in the terminal
or browser for all to see. For example:
$debug = 1; if ($debug) { echo $div_info; }
You probably don’t want a bunch of debugging text to show up in a production site, but for early coding and in development servers, being able to see values change is worthwhile. I sometimes build a debugging switch into the code that allows me to change one value high in the code that will tell all of my debugging output to display.
There is another command—print()
—that does the same thing. “What’s the difference between
echo()
and print()
?” is a common question about PHP.
The quick answer is, “Not much. Use what you want.” However, there are
some subtle differences that can affect your coding decisions. First,
echo()
is a little faster, since
it doesn’t return any value (print()
will return a Boolean to indicate
success, which might be important for logic flow). The other important
difference is that echo()
can
accept multiple parameters, so you can concatenate strings together with
commas. For print()
, only one
string is allowed as a parameter.
For arrays, output becomes a little more complicated; all of the
keys and values run together, making it very difficult to read.
Fortunately, PHP has a special function for investigating array
contents, called print_r()
, that
will display the contents of an array in a way formatted to be readable
by humans. Similarly, the var_dump()
function will iterate through
arrays and objects, displaying all of the information they contain and
indenting to show how the content structure is nested:
print_r ($array); var_dump ($array);
Finally, there is the exit()
command, discussed earlier in the
section Controlling the Flow of Logic. If your
application does a lot of things, such as writing to a database or
changing saved states that you might later need to revert, a well-placed
exit()
will save you some
headaches by killing the application at the desired point. Just remember
to remove it after you figure out a solution to the coding
problem!
There are a number of reasons it may become necessary to store data outside of your application in order to support what it does. We do this when we want to pre-populate web forms with saved information, and when we want to show only a part of a larger data set. The raw information is stored in a database. A database can take many forms, including a simple text file where you write the information you want to retrieve later. The dominant form of database, however, is the relational database management system (RDBMS).
MySQL is one widely distributed relational database system that is commonly installed on web hosting servers for your use. You may be limited to creating a certain number of databases. If so, don’t worry; in this book we’ll only use one.
The Structured Query Language (SQL) has been around in some form for almost four decades. One of the criticisms of the language is that different servers have slightly different syntax variations, so SQL statements that run well in Microsoft SQL won’t necessarily work in MySQL. This book deals specifically with MySQL syntax.
MySQL databases can be very powerful. The interaction with a typical web application, though, boils down to a handful of statements for the following tasks: creating tables, selecting data, inserting new data, updating existing data, and deleting data you no longer want.
When you first create a database, there is no there there. It is just an empty shell; you could use PHP to connect to it, but that’s it. To make it a useful place to store and filter data, you must first create a structure to contain the data.
The CREATE TABLE
statement accomplishes this by describing the fields each row, or
tuple, of data can store. Each field is assigned a
name and a data type, plus other information about its size, default
value, and whether it can accept NULL values. MySQL also would like to
know how you might want to access the data in the future. You can aid
data retrieval by specifying which fields are going to be indexed for
search. The primary key—the field that contains the
unique identifiers to distinguish one record from the next—will always
be indexed, but you can also specify that indexes be maintained for
other fields. Example 3-6 shows a
table called access
with five
fields.
CREATE TABLE IF NOT EXISTS 'access' ( 'record_id' int(4) NOT NULL, 'password' varchar(255) NOT NULL, 'created_at' datetime NOT NULL, 'date_processed' timestamp NOT NULL default CURRENT_TIMESTAMP, 'is_enabled' tinyint(4) NOT NULL default '1', PRIMARY KEY ('record_id'), KEY 'is_enabled' ('is_enabled') ) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=74462 ;
Most database clients—including the web-based phpMyAdmin tool many web hosting companies use to allow user access to MySQL (see Figure 3-1)—provide some basic GUI or form that you can use instead of actually writing the SQL statement to create a table. The important definitions are still the same, regardless of the method of creation: you need to tell the database what kind of data you want to store and how you are likely to try to retrieve that information later.
Most of your application’s interaction with the database
is likely to be in the form of SELECT
statements. These SQL commands ask the database to look in the contents
of its tables and return specific data that matches your
criteria.
SELECT
statements have a few main parts:
SELECT
names the fields you
want to get.
FROM
indicates where to get
the data.
WHERE
specifies some
conditions to filter all the data into a useful set.
ORDER BY
allows you to sort
the results on one of the returned fields.
Additionally, you can take advantage of the relational nature of
MySQL by joining multiple tables together to create a new collection of
data that isn’t explicitly stored in the database. LEFT OUTER JOIN
will connect two tables with a
common field, without requiring that the second table have a matching
record. If it doesn’t, the fields that would have been filled by that
table are returned with NULL values. An INNER
JOIN
requires that both tables have a common record. Example 3-7 shows a SELECT
query involving two tables used by one
of the sample applications, tweetbroadcast_tweets
and tweetbroadcast_groups
.
SELECT DISTINCT t.status_id, t.author_username, t.author_fullname, t.author_avatar, t.tweet_text, t.tweet_html, t.pub_date, concat(year(t.pub_date),' ',LPAD(dayofyear(t.pub_date),3,'0')) as tweet_day FROM tweetbroadcast_tweets t INNER JOIN tweetbroadcast_groups m ON m.other = t.author_username WHERE m.owner = 'kmakice' ORDER BY t.pub_date desc LIMIT 0,20
The LIMIT
clause will restrict
how much of the full data set is returned. It is associated with two
numbers, separated by a comma: the first number reflects the record that
should become the first record in the returned data (0
indicates MySQL should start at the top),
and the second number is the maximum number of records that should be
included with the initial record. The query in Example 3-7 will return the first 20 tweets
authored by people in my broadcast group.
The last clause of interest to us is GROUP BY
. This is an aggregation instruction
that tells MySQL to return only one record for each unique group,
summing, counting, or calculating the other fields over the values in
all of the group’s records. For example, if I had a table that stored
all of my family’s Twitter status updates (my wife, my son, and yes, my
dog all use Twitter), I could generate some stats for each author by
grouping on the username
field and
counting records. Such a query would tell us that my dog tweets more
than my son. It’s very sad on several levels.
Alternatively, if your goal is simply to eliminate redundancy in
the data, you can use distinct
in the
SELECT
clause to limit the data set
to just one instance of any given combination of the selected fields. If
I collected my son’s tweet archive several times and had duplicates of
the same records, without using distinct
his aggregated statistics would be
much higher than the actual count. distinct
is handy in statements using GROUP BY
as a way to count the number of
different values in the data set, as in:
SELECT COUNT(distinct username)
The fields used in the SELECT
list and the WHERE
conditions for the
search can be manipulated using functions in MySQL. They work in a
similar fashion to functions in other languages, like PHP, in that you
pass a value as a parameter and get some kind of response. For example,
DateDiff()
will examine two dates
and return the number of days that separate them. These kinds of
functions become very useful when trying to compare similar data
expressed in incompatible formats and for shaping the way the
information is returned in the query.
Of course, the only way to get something out of the database is to put something in. The data doesn’t appear by itself; it must be added and kept up-to-date with the help of some special editing statements that insert, change, and delete values in table fields.
Although it is possible to transfer information from one part of a
database to another by combining INSERT
and SELECT
statements, for our purposes it is
sufficient to add data one row at a time. To add data to a table, the
INSERT INTO
statement lets you
specify a table and the field(s) you want to fill. You then specify the
data using the VALUES
clause, listing
the new information in the same order you listed the field names. See
Example 3-8.
/* This inserts a new row of data into 'autotweet_profiles' */ INSERT INTO autotweet_profiles (user_name, password, rss_url) VALUES ('kmakice', 'DKFSHOIER*S(R(WE%', 'http://www.blogschmog.net/feed/') /* This changes the values in three fields of a record in the table */ UPDATE autotweet_profiles SET password = 'DKRKSIKDLER*KDOUFLIEO*', rss_url = 'http://www.makice.net/blogschmog/feed/', is_enabled = 1 WHERE user_name = 'kmakice' /* This removes all records for 'kmakice' from the table */ DELETE FROM autotweet_profiles WHERE user_name = 'kmakice'
UPDATE
is a statement that
works on existing records in a table, allowing you to change the stored
values of specific fields. As with SELECT
, you must first identify which table is
to be targeted and which records are being changed, using the WHERE
clause. Updates also use a SET
clause to list the fields of interest with
their new values. UPDATE
statements
can affect more than one row, based on the criteria defined in the
WHERE
clause. All of the matching
records will be set with the same values specified in the SET
clause.
Finally, there is the DELETE
FROM
statement, which removes records from data tables. To
delete data, you specify the affected table and the criteria that need
to be matched. Any records matching the WHERE
clause will be removed.
The first thing you’ll need for your new web application is a home. All of your brainy ideas and masterful code won’t be any more useful to other people than an email from your grandmother if your code can’t be compiled and do something interesting. This section briefly looks at some of the things to consider when searching for a server from which to publish your new application.
There are a number of factors you will need to consider when selecting a web host to publish and protect your work. The most important one (for your bank account, at any rate) is cost.
Hosting services can have a few different kinds of configurations. These include:
Racks of machines are set up, and your account shares physical and virtual space with other accounts. If their sites go down, so do yours. If you stress the processor with a bunch of big queries, other accounts suffer the consequences, too.
Your slice of the big shared hosting pie includes CPU, RAM, and disk space that are not affected by and will not affect what happens on other VPS accounts. You can have root access to your virtual machine.
You own the server, but you don’t have to maintain it. This arrangement is like VPS, except that “virtual” is replaced with “physical” (your server is a real machine that no one else uses).
The best analogy I’ve seen is a housing analogy: dedicated hosts give you a mansion, VPS hosts give you an apartment, and shared hosts make you live in a dorm room.
Shared hosting services can offer web space and a lot of built-in support for about $10 a month. These companies try to make it very easy for people to install open source tools such as blogging and chat applications, and they will almost always support the most popular development platforms. What they won’t do very well is help you with your code. If something breaks, odds are good you will have to work out your own solutions or turn to the community of web developers who share their expertise in online forums. Shared hosts frequently have slow and crowded databases, which may become problematic if your awesome new web application takes off.
The low-cost options also put limits on the amount of traffic, or bandwidth, you are permitted. For most small sites, this limit may seem impossibly high; it may be 10 times as large as the amount of hard drive space you are allowed, which may be a few hundred gigabytes. Text takes less bandwidth than images or movies, but if a few hundred thousand people start visiting your website, even the text adds up.
Don’t underestimate the importance of bandwidth limits when it comes to a Twitter application. News of interesting tools spreads quickly among Twitter’s several million accounts, and developers are often overwhelmed with the response—just ask Ryo Chijiiwa, the creator of Twitterank (see Tools for Statistics). It is not uncommon to have to switch web hosting to an account or a company that can better handle the traffic. It would be best to be proactive and find a hosting company that is prepared to scale with your application. Upgradable VPSs, particularly those with some allowances for bursting past your allotted limits, should meet these needs.
Before you let people know about your great new Twitter application, check with your web host about what happens if you unexpectedly exceed your bandwidth limit. In some cases, automatic charges are levied based on the amount of traffic you have. In other cases, your account—including any other websites you may be hosting—will essentially be shut down until you do something to upgrade your host configuration.
Other factors to consider include the hosting service’s track record for keeping the servers up and running (99% uptime should be a minimum requirement), availability and responsiveness of tech support, domain name registration, secure FTP access, usage statistics reporting, and whether your account can support secure transactions. For the purposes of this book, the most important criteria involve common but vital functionality available on most modern web servers. The following are the primary requirements:
MySQL database server
PHP scripting language
A place outside the web path to place supporting code
cron jobs for scheduling tasks
The web host on which the sample code in this book was developed was a Linux server with PHP version 5.2.6 and MySQL version 5.0.67. In computing, things change quickly and incrementally, so even the most on-the-ball server administrators may lag a little behind these server application developers. If your host is reasonably close to the latest releases, you should be fine. Even if it’s a bit behind, though, you should find that most of the core functionality you use works with earlier versions.
Both PHP and MySQL do add some useful functions in major new
releases, such as str_split()
in PHP and LOAD XML
syntax in
MySQL. If you are having difficulty getting something to work,
double-check the version of the server software against the version
requirements of that function found on the documentation
websites:
When you build a web application, the person using the site triggers much of its functionality. There’s no need to fetch data until someone shows up at the website asking for it. However, there are situations where you won’t want to rely on web traffic as a catalyst for your program.
Servers do have some easy ways around this, such as using a cron job in Linux or the Task Manager in Windows. The latter platform also supports proprietary scheduling tools such as nnCron and VisualCron, which adds a GUI and a lot of functionality to the process of automating tasks.
Traditionally, a special text file known as a
crontab handles the work of scheduling tasks to be performed at
regular intervals. A special syntax instructs the server when and where
to look for scripts to run: you supply the path to the script and tell
the server when to run it by specifying values for the minutes, hours,
day of the month, month, and day of the week fields, with an asterisk
(*
) serving as a wildcard to allow
any value. In this example, script.cgi will run at
3:30 a.m. every Saturday:
30 3 * * 6 /path/to/script.cgi
The crontab file is usually tucked away out of the reach of casual users. Through a web host control panel, however, scheduling a cron job is as easy as filling out a web form. You select the intervals for how frequently the task should run, and the crontab entry is generated for you.
The only tricky part is how to reference the PHP script. Some systems will accept either a URL to a web page or a server path to the .php file; others require just the server path. Since the server just sees the file as text and not as a powerful script, you must also tell the crontab to render it as PHP. This is accomplished by simply referencing the path to the PHP interpreter:
/path/to/php /path/to/your/cron_task.php
If you can’t get your cron job to parse the PHP, you can try to force the server to do so using a text-based browser (/path/to/lynx) or an HTTP request (/path/to/wget) instead of calling the PHP parsing engine directly (/path/to/php). You will have to use a URL instead of the file path to reference the PHP script.
When the cron job runs, the web host may have its server configured to email you the status of the script along with its output. Those notification emails can add up—scheduling a task to run once a minute will result in 1,440 emails each day.
If you don’t have access to a crontab file, there are still ways you can simulate automation.
One way is to use a free pinging service to start your program
each day. Companies such as Site24x7 will try to load a web page
at regular intervals and generate a report on how successful their
attempts were. Although this is useful to generate server uptime
statistics, it can also have the consequence of launching a PHP page.
Once launched, that script can do all sorts of things, including
mining data or cleaning up files. You can also insert a call to your
backend task either as part of the script generating the HTML or in
the HTML itself, using <script>
or <img>
tags that point to the PHP file
you want to run.
These tricks are not recommended, however, since they are both unreliable and can potentially get in the way of the web content you want to display. If that functionality (for example, a data-mining operation) takes a long time to run, your page load times may suffer. This is particularly problematic if the web page is heavily trafficked. There is also the risk that no one will visit your website on a given day. If that happens, nothing will trigger the backend task, and therefore nothing will get processed. As a result, these techniques are only really useful in situations where the jobs don’t have to run regularly.
For experienced programmers and web designers, most of what was discussed in this chapter is old news. For new or casual programmers, this chapter will likely serve as a good reference for you as you create your Twitter application.
The next two chapters delve into the nitty-gritty of the Twitter API, covering the methods that let you request and change data as well as the responses those methods send back to you. I hope you got enough out of this chapter to feel confident in taking the next step. If this is your first web application (and you’re trying not to feel overwhelmed), remember that you can come back and use this chapter as a reference when you code. I do want to stress, however, that this is just a sampler plate from the buffet that is Internet development. You won’t need more to understand the sample code in this book, but there is more to be had.
Here are some online resources to consult to give you a more complete picture of what is possible when building web applications:
Additionally, here are a couple of other books worth mentioning as reference desk companions:
HTML & XHTML: The Definitive Guide, Sixth Edition, by Chuck Musciano and Bill Kennedy (O’Reilly)
Build Your Own Database Driven Website Using PHP and MySQL, Third Edition, by Kevin Yank (SitePoint)
Essential PHP Security: A Guide to Building Secure Web Applications, by Chris Shiflett (O’Reilly)
[60] According to a Pew Internet & American Life Project report called “Home Broadband Adoption 2008,” 55% of American adults now have broadband Internet at home (http://www.pewinternet.org/pdfs/PIP_Broadband_2008.pdf).
[61] I have boiled down the full list of available PHP functions to just the ones used in the sample applications. For the full list, or for more details about any of the functions listed in this section, check out the official PHP documentation available online at http://www.php.net/docs.php.
52.15.129.90