Appendix E. Dancing with Perl

Written by Alastair McGowan-Douglas

Image

Web work is a very common use of Perl. The Web is a heavily text-based medium, with plain-text HTTP being used to negotiate the transfer of plain-text HTML, XML, or JSON resources.

In the past, CGI (Common Gateway Interface) was used, but today a module called Dancer is a popular alternative. Dancer is a free, open-source micro Web application framework written in Perl that creates Web applications by building a list of HTTP verbs, routes, and methods to handle them. (See www.perldancer.org/quickstart.)

Dancer, as well as all the other popular modern Web frameworks (such as Mojolicious and Catalyst), differs from CGI in the principal sense that your entire Dancer application is a single script that you run manually. Compare to CGI, where each CGI script would be run by the Web server, and only when requested.

The following URLs will be used to access the program example we build in this appendix.

http://localhost:3000/student.cgi—CGI style

http://localhost:3000/student—Modern style

The latter URL makes no mention of how the student will be served to the client; in particular, there is no mention of CGI (or PHP or ASP or anything like that). This allows us to later rework the underlying code, and clients will not have to change their bookmarks.

A major benefit of the modern style of structuring a Web application is that we don’t have to compile and run the script on every single request. We can create a single process that will handle the request and return a response. Having actually launched the application once, means it was compiled once; and if compiled once, we can make much more complex systems that can take as long as needed to actually launch, because we launch them so rarely.

The following example will draw upon the Student.pm we created in Appendix C, “Introduction to Moose (A Postmodern Object System for Perl 5).” First, we will show a simple script that contains some example students and allows you to view them; then we will extend the script to create new students.

But first, you will need to install Dancer. For a discussion on setting up your environment for installation of modules (and for a new Perl if you want to upgrade), see Appendix D, “Perlbrew, CPAN, and cpanm.” Once done, you may simply run

cpanm Dancer

and Dancer will be installed, ready for use.

E.1 A New Dancer App

Dancer comes with a utility, dancer, that creates a Dancer application for you. Simply run dancer -a Example and you will get a new directory, Example, containing a functional (but featureless) Dancer application. Feel free to replace Example with any valid module name—a module by this name will be created for you, and this module will be your whole Web site!1

1. Avoid choosing Student for this example, because then we will have a collision between the Student.pm that Dancer creates and the Student.pm that we’re going to use for the example. Usually this will not be a problem. In the real world, both Student modules would be namespaced differently; e.g., MyApp::Student for the object and MyApp::Web::Route::Student for the Dancer part. For the purposes of this appendix, it is simpler to avoid this issue instead.

At this point, you will be warned that installing YAML is a good idea. In fact, you will need to. Dancer says to run cpan YAML; but we want to run cpanm YAML to benefit from the environment that we set up in Appendix D.

The new Example directory contains everything you will need to run your new application. To test it out is easy.

Congratulations! Your first modern Web app has been created. But it doesn’t do anything yet. By following the URL mentioned in Dancer’s output (by default, this is http://0.0.0.0:3000) you can access the Dancer welcome page (see Figure E.1).

Image

Figure E.1 The Dancer welcome page.

Example E.2 shows the contents of lib/Example.pm. This is the file that contains the actual behavior of the application. app.pl is just a helper script that launches it. The welcome page, pictured, is simply the contents of views/index.tt. Templates and views are explained later in this appendix.

E.1.1 Verbs

The Web uses HTTP to request and send content. That which is requested and served is generally called resources; i.e., the R in URL (Uniform Resource Locater). HTTP uses a URL and a verb to tell the server what we want. We already understand the URL part: http://localhost:3000/ means “the root resource at localhost on port 3000 via the HTTP protocol.” Normally this is an index page listing all the other available resources. In this case, it is a page that confirms to us that we have successfully created a Dancer app.

What that URL doesn’t say is that we want to get that resource. That is the verb in HTTP. If you use curl, you can see exactly that, as shown in Example E.3.

The GET in the curl request just shown corresponds to the get in Example.pm. The / in the request corresponds to the ‘/’ in Example.pm. Dancer, therefore, makes it simple to write Perl that matches closely to the HTTP requests. An HTTP request for GET / is handled by get ‘/’ in Dancer, and HTTP request for POST /students is handled by post ‘/students’.

The common HTTP verbs are GET, POST, PUT, and HEAD. Others exist, but they are used less often. These are described in Table E.1.

Image

Table E.1 HTTP Verbs

Dancer allows us to define different handlers for different verbs. That’s useful. Instead of having to test whether a POST or a GET was made, as we did with CGI, or to POST to a different place and redirect the user back to the form if there was an error, with this method of structuring the code, we can simply choose which function is run based on what the request is for.

That means that we can create a GET / handler that creates a list of students, and a POST / handler that adds to the list.

We already have a GET / handler, but it doesn’t produce a list of students yet. The following is a new Example.pm that creates a collection of students and serves it.2 Remember, Student relies on Moose, MooseX::Declare, and MooseX::ClassAttribute, so you’ll have to install these with cpanm if you haven’t already.

2. To use Student.pm, you should copy it from wherever you have it (or the Internet if you don’t have a copy) and put it next to Example.pm. This is uncommon. Normally, Student.pm would be part of a larger CPAN release, and you would have, therefore, installed Student.pm using cpanm, along with a whole suite of other Moose classes that came with the distribution. In that case, your Student.pm would already exist in @INC and you would not have to copy the file around at all. The other common setup is that your Student.pm and Example.pm are in the same distribution, but different directories, using namespacing as suggested in the previous footnote.

In order to use this new script, we are also going to have to change the template. Dancer is not going to magically know what to do with the template data we provided, after all.

E.1.2 Templating

Templating is a manner of separating the way you represent a resource from the way you produce the resource. In our example, the list of students we are representing is directly defined inside the module. Every time we quit the application and restart it, the student data will be reset, because the program will be recompiled.

Alternatives to this include storing the data in a file, putting them in a database, or even storing them on somebody else’s network! We could easily script our Web site to fetch data from a different Web site—that’s how RSS readers work.

You can also represent a resource in different ways. In this case, we are going to assume we want to see the students in HTML form because we’re going to assume we are using a browser. However, if we wanted to create an RSS feed of our students, we could use a different template with the same set of students to produce RSS-compliant XML. Or we could represent them as JSON. A Web-based RSS reader would produce resources by collecting them from somebody else’s Web site; and it would represent them by repackaging them into HTML form.

So we really do want to be able to fetch the data in one place and represent it in a different place. This is something we simply couldn’t do in the CGI days; in those days, you had to have a different script for different things—and CGI only provided HTML helpers!

Templating is a way of sending data to text files, and thus constructing the text files consistently based on the data we send them. We define a template, which contains the common factors like the header and footer HTML, plus the HTML we use to represent a resource, and into it we put the data items that make one resource different from another. In this example, templating is done with the Template module (see http://template-toolkit.org for more information), so we install that:

cpanm Template

Next, we need to change config.yml to say template: “template_toolkit”. The following is config.yml with comments removed for brevity. The reader is encouraged to read the comments in his own copy of the file. (This file is why we installed the YAML module.)

When the Dancer application launches, the configuration is read from config.yml. This sets up Dancer with a default set of configuration, and in this case we are configuring the template system. There are many other configuration options, and you can even set it up so that it uses different values in different environments. This is the mechanism that allows us to put a Web site live without accidentally showing the users messy errors that are only useful to developers debugging the system.

Now that we’ve told it which template system to use, we turn our attention to the views directory. This contains all the templates, and is where Dancer will look if you don’t tell it to look elsewhere. View is the term we give to the concept of being able to display the same resource in different formats. There may be a JSON view, an HTML view, an RSS view, and so forth. To keep it simple, we’ll leave the structure as it is, but it is often beneficial to add subdirectories to views, one for each supported format.

The default index.tt can be opened to see the HTML that makes up the welcome page that was shown in the browser when the application was first run (Figure E.1). Part of this file uses the <% %> syntax of Template::Toolkit, which Dancer’s “simple” template system also uses, to output information like the Dancer version and the environment data on the right.

Now that we’re using Template::Toolkit properly (that’s what the tt stands for in index.tt) we can replace index.tt with other useful templates. The following example lists the new index.tt.

Because changes were made to Example.pm, it is necessary to restart the development Dancer server. Simply kill it with <CTRL>+C (<CTRL>+Z on Windows) and run it again. The script, app.pl, does have an option to restart itself when it detects changes to the files, but this can be unreliable, especially on Windows, and so it is often simpler just to restart it manually. Manual restart also means you can change several files several times before you actually restart it.

Once the server process is restarted, reloading the page in the browser will now show a page similar to Figure E.2 utilizing the main principles of Dancer development: server processes (app.pl), configuration (config.yml), route handling (get ‘/’), and template systems (index.tt).

Image

Figure E.2 The (unstyled) student list. The right-hand column was removed when index.tt was rewritten, but because layouts/main.tt was not changed, the layout did not change.

E.1.3 Parameters

It is not necessarily a given that a GET request is complete with just the URL to the resource. Many resources are actually entry points into a set of resources. Our entire application so far is actually an entry point to the whole set of Student resources available to the application. Normally we expect to be able to filter pages like this or to access a single resource.

The modern Web offers two common ways of parameterizing. They differ in how they work, and hence how they should be used. Both of them are part of the URL.

A query string is the part of the URL following a question mark. Commonly it is a key/value pair string, where the keys and values are separated by equal signs and the pairs themselves are separated by ampersands:

?name=John&courses=java

Often, this is referred to as the GET string, but this is a misnomer; because it is part of the URL, the HTTP verb used with it is not relevant. A POST request can have a query string, and of course it does not have to be an HTTP request in the first place. It is perfectly valid to have a URL like file:///home/user/html/index.html?user=me, which doesn’t use HTTP at all.3 Query strings are so called because they are normally used to query the index resource for a more specific set. In the preceding example, it would be expected that we found students with the name “John” and the course “Java”.

3. It may not be meaningful to do this; the point here is to illustrate the difference between the URL and the protocol (HTTP versus file).

The other type of parameter is in the path itself. This is usually used to be more specific about exactly which resource we want. If we were dealing with a bigger system, with many schools and universities, we might wish to be more specific about whose students we were after. In this case, the URL might look something like this:

http://0.0.0.0:3000/univ/mit/students/

This would be built using the techniques we’ve already seen, plus some techniques we are about to see. The use of get ‘/univ’ is almost certainly going to feature here; but once a system gets this big, it is likely that a new module (for example, University.pm) will be added to lib/ in order to keep the system well organized.

Of course, these parameter types can be combined. The following URL might refer to all students at MIT who are named Joe and are doing advanced math:

http://0.0.0.0:3000/univ/mit/students/?name=joe&course=advanced_math

We can also get a specific student from the whole database using the path form of parameterization, by specifying the student’s ID in the path, like so:

http://0.0.0.0:3000/students/1

This URL has lost the part that specifies “MIT” because a student’s ID is unique in the system; therefore, if we know exactly which student we want, it is of no benefit to specify “MIT” any more. This form of parameterization is the most central to the concept of a URL; the URL /students/1 points exactly to a single resource—a specific student. The index resource can show you different things in different situations, but /students/1 will always show you the current state of student 1.

The students part of the preceding examples, of course, implies a system larger than our example. Dancer actually makes it simple to augment paths in this way. We can easily add the students/ part to our example later on. This means it is possible to create parts of systems separately from other parts, and then include them later on as subsections of a bigger site. Later in this appendix, we will see that. For now, let’s define a route handler that will get a student based on ID.

The following example is an abridged version of Example.pm. The listed code can be added after the existing code, but of course before the true; line, which must always come last in the module.

Remember to restart the app.pl that you ran earlier, in order to compile the new code and gain the new behavior.

Results can be seen simply by adding a number to the URL in the browser. The students defined in the file have idnumber values of 123 and 321. Adding these numbers to the URL will, therefore, serve the student with the appropriate idnumber (see Figure E.3); any other number will serve the 404 page (see Figure E.4); and no number at all will serve the index of students, as in Figure E.2.

Image

Figure E.3 The response for the URL http://0.0.0.0:3000/123 is, as expected, Bob, with ID 123.

Image

Figure E.4 The response for the URL http://0.0.0.0:3000/124 is a 404 because that resource was not found.

It is important that we used the same template for this single view as we did for multiple. After all, displaying a single result should be the same, whether it is because we only had one student in the index, or whether we explicitly requested one student. This encourages consistency, which makes it much easier for both humans and machines to understand the pages we return. It also makes it much easier to maintain, when there are many fewer files to understand the purposes of.

Next we address the other type of parameterization: the query string. It is common, but not required, that the query string uses the ampersand/equals sign format of key/value pairs. Dancer will split up this format into the params hashref automatically, but it also provides a means to access the query string directly if the route handler does not expect this format.

The following example is a replacement for the existing get ‘/’ route handler in Example.pm, once again, abridged for brevity.

We can now use the query string method to query the student index, as in Figure E.5, in which all students whose name contains a b or a B are returned.

Image

Figure E.5 The response to the URL http://0.0.0.0:3000/?name=b.

Astute readers will have realized that if a query is run that returns no results, the page is returned normally, but with no students in it. This differs from the previous example, wherein if the student was not found, a 404 error was returned.

The difference lies in the consideration of resources. This same consideration determines whether you want to use the path form of parameterization or the query form. In the first example, the student’s ID number was part of the URL’s path. That means that the student whose ID is 123 is a separate resource from the student whose ID number is 124—a different path represents a different resource. As it happens, the resource where the ID number is 124 does not actually exist, while the resource where the ID was 123 did exist. In the first example, the URL that returned 404 was requesting a different resource from the URL that returned the representation of a student.

In the query string example, the same resource was being requested with a different set of query parameters. The student index was being requested in both cases. The index resource always exists, and if your query matches no students, then an empty index is returned. That is to say, the representation of the index resource with the query ?name=x is just representing a resource with no items in it. But the index resource still exists. An empty set is different from a set that does not exist at all.

So a query string does not affect which resource is being requested, but a different path does.

E.1.4 POST

It was mentioned that a POST request can be made to create a new resource under a given resource. That means that if we want to make a new student, we would probably want to post the data to the index resource. Since the index resource always exists, all we have to do is allow it to handle POST requests and define what it should do when it receives one.

We know that a POST request should create a new resource, but how would it know the values with which to create this student? Well, the third form of parameterization, one that was not mentioned previously, is in the body of the request. When an HTTP request is made, it sends a verb and a URL, along with certain other information like the Host, Accept, and User-Agent headers we saw in our curl request earlier. For certain requests4 it can also send a body, which is essentially everything else. POST parameters are sent in the body.

4. In fact, a body can be sent for any request, but it doesn’t always make sense to do so, and so sometimes the body may be ignored.

As usual, the following example is abridged. It lists the new route handler to go into Example.pm. It can be added anywhere, but it is recommended to organize them based on the route they handle. Since this is post ‘/’, it makes sense to put it after the handler for get ‘/’.

Restarting the application now will allow you to send POST requests to the index page to create new students. Normally this would be done with a form on the page, but that has not been written yet. Instead, we can issue curl requests again:

Having run this curl request, refreshing the index page in the browser will produce output similar to Figure E.6.

Image

Figure E.6 After running the POST request with curl, the response to the URL http://0.0.0.0:3000 now contains three students, rather than two.

Having done this, it may also be noted that the URL that previously returned 404, http://0.0.0.0:3000/124, now returns the new student, Jill.

Sometimes, in the wild, you will see the query string used to create a resource. This is a bad idea; a page in the browser should be able to be refreshed. The browser will warn if this involves sending POST data again, but it is assumed that the query string is safe. The reverse is true; if POST is used to query a resource like the student index we just made, the browser will assume it is unsafe and warn the user when refreshing. These behaviors are defined by the HTTP and URI standards, so by contradicting the standards, Web sites cause themselves and their users issues. That is why our student index uses a GET request with a query string to fetch students, and a POST request with body data to create students.

A good way to remember the difference is to ask yourself whether it would make sense for a user to copy the URL from the browser and send it to a friend, in order to share the information. If so, it is probably a GET request and should be either the path or the query string of the URL. If not, you are probably creating a resource and you want to POST the data.

Remember, it is not only humans with Web browsers that will send requests to your application. Scripts (for example, search indexers) are going to make assumptions based on the standardized meaning of URLs and HTTP verbs.

Exercise E: May I Have This Dance?

1. Add a form to index.tt that creates a new student. The form should probably have method=“post” and action=“/”.

2. What happens when you POST invalid data? What should happen? Amend post ‘/’ to error sensibly when the POST parameters are invalid. Creating a form assists the user in providing the correct field names, but a script or test using curl is still likely to get it wrong. Use HTTP status codes to indicate errors. Remember that the 400 range means the user did something wrong, and the 500 range means the server (you) did something wrong.

3. In the user index, create links from each student’s name to the full URL for that student (for example, Bob should link to http://0.0.0.0:3000/123). Note that you can either omit everything before /123 (i.e., just use /123 as the URL, and let the browser behave correctly), or you can use Dancer’s uri_for function. The former option is defined as standard behavior for browsers to follow (that is, if the browser sees /123, it will assume the http://0.0.0.0:3000 part), and the latter option will have to be performed in the route handler, because the templates will not be able to use uri_for in the current setup.

4. What happens to the new students when you restart the server process? Why?

5. What happens when you try to run it a second time? This method of writing Web applications is supposed to allow you to run the script as many times as is necessary to handle the amount of traffic to the site. Run the script with -p 3001 and note the different URL reported to you by Dancer.

6. How will you ensure that, if two copies of the application are running at the same time, they will respond to the same URL with the same information? Tip: you will not be able to have an array @students any more, because this array will be unique to each application process, and it cannot be shared. You will have to store the student data externally, and have the script pass on data requests somehow.

7. Amend the application to store Students in a new MySQL database created for the purpose. The same three route handlers can be used for this; the only difference is how you fetch and how you store the students. The query parameters in get ‘/’ will become bind values in a MySQL query, instead of a loop. Dancer provides a database function that you can use to get a handle on your DBI connection, as long as you correctly configure your application.

8. What is the environments/ directory for? Consider whether your configuration changes are for development, production, or all situations.

9. What goes in public/?

10. Spend some time styling up the student index to look a bit more in keeping with the theme of the default template.

11. Investigate Dancer’s prefix function. This will allow you to prefix all your current route handlers with, for example, /student. This will mean that your current / handler will be available on /student/, and /123 will now be /student/123. This makes sense. You are handling student resources here, so the URLs should say so. Most systems deal with more than just one type of resource, after all.

12. The reader is encouraged to read further on deploying Web apps in this manner. There are many options, but common ones include plackup and Starman. It is also possible to have Web servers like Apache and nginx run your application as necessary.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.134.17