Appendix. RESTful Primer

Roy Fielding described Representational State Transfer (REST) in 2000, in his doctoral dissertation (http://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm), as an architectural style for distributed hypermedia systems. His work wasn’t tied specifically to HTTP and web services, but it has often been applied to the problem of web services design. This appendix takes a quick look at Fielding’s paper, reviews the basics of HTTP, and outlines how these map to the concepts of REST.

Roy Fielding’s REST

REST is an often-invoked and often-misunderstood concept. To help clear up some of the confusion, let’s look at the original source, Fielding’s paper. In the summary of the section on REST, Fielding states:

REST provides a set of architectural constraints that, when applied as a whole, emphasizes scalability of component interactions, generality of interfaces, independent deployment of components, and intermediary components to reduce interaction latency, enforce security, and encapsulate legacy systems.1

The key component in this summary is that REST is merely a set of architectural constraints. There’s nothing in there about HTTP, naming conventions for resources, or how to properly use the HTTP verbs. However, it does mention a few of the goals of service-oriented design mentioned in Chapter 1, “Implementing and Consuming Your First Service.” Most notably, it mentions scalability, agility, isolation, and interoperability. The rest of Fielding’s paper describes how REST helps achieve those goals.

Constraints

One of the primary characteristics of REST is the set of design constraints it proposes. Here’s the list of constraints, with brief descriptions:

Client/server—REST applies to client/server architecture.

Stateless—This is one of the most commonly mentioned constraints. Each request from the client to the server must contain all the information necessary to service the request.

Cache—The cache constraint states that each response from the server must be implicitly or explicitly labeled as cacheable or not cacheable. This often-overlooked feature provides a key part of the scalability of the RESTful style.

Uniform interface—Interactions between the client and server must conform to a uniform interface. In the context of HTTP, this maps to the commonly understood verbs GET, PUT, POST, DELETE, OPTIONS, and HEAD.

Layered system—This constraint states that the architecture can be composed of multiple layers. Further, each layer cannot see beyond the next layer it communicates with. With regard to our service-oriented design, this is analogous to the consumers of web services not having direct knowledge of a database behind a service.

Code-on-demand—This constraint states that the functionality of a client can be extended by downloading additional code. With respect to the web, this refers to technologies such as JavaScript, Flash, or Java applets. However, it is listed as an optional constraint.

Architectural Elements

REST focuses on the architectural elements of a system. This means bringing a focus on the components of a system and their interactions as opposed to the implementation-level details or protocol syntax. Fielding outlines the following elements in his paper:

Data elements—The data elements piece of a RESTful architecture are often an area of focus for design purists. This includes resources, resource identifiers, representations, representation metadata, resource metadata, and control data.

Connectors—Connectors are interfaces for communication between components. The primary types defined are client, server, cache, resolver, and tunnel. For the purposes of our RESTful web services design, the connectors element doesn’t really come into the picture.

Components—REST components are separated by their roles within the architecture. These include the origin server (Apache, Nginx, and so on), the gateway (HAProxy, Squid, CGI, reverse proxy, and so on), the proxy (a client chosen proxy server), and the user agent (Firefox, Safari, Internet Explorer, and so on).

Connectors and components are important, but they are relatively well defined through third-party applications that you usually don’t have to modify. The architectural components are an implementation detail that can be chosen after a service has already been written and is ready for deployment. The only concern going forward for RESTful web services is how these components take advantage of data elements.

Architectural Views

Fielding’s description of REST defines three architectural views that help illuminate how the pieces of a system work together:

Process view—This view shows the interactions between components as data flows through the system.

Connector view—The focus for this view is on the details of network communication between components.

Data view—The data view reveals application state as data flows through the components of the architecture. Because REST specifies a constraint of statelessness, this is most applicable when looking at caching layers. The data view shows where cached responses can come from.

REST and Resources

Now that we’ve taken a fairly detailed look at Fielding’s definition of REST, we’re ready to tie it all together with the goal of designing RESTful web services. Resources are at the heart of the design of services.

The concept of resources lies at the core of a RESTful design. At its most basic level, a resource is simply a stream of bits representing something. This could map to a record in a database, a text file, an image, or a set of search results.

Here are some examples of resources that you might use in the social feed reader example:

• The profile information for a user

• A comment

• An entire comment thread

• A user’s activity stream

• A list of a user’s friends

• The last 100 activities of a user’s friends

Notice that some of the listed resources are actually collections of other resources.

URIs and Addressability

All resources must be addressable. A uniform resource identifier (URI) provides the address for each resource. A resource must have at least one unique URI by which it can be accessed. Note that it is possible to access the same resource through multiple URIs. A good example of this is a specific version of a software release. The following two resources could point to the same thing:

http://ruby-lang.org/pub/ruby/1.9/ruby-1.9.1-p129.tar.gz

http://ruby-lang.org/pub/ruby/1.9/current-release.tar.gz

The two resources point to a release of the Ruby programming language. While these two URIs may reference the same resource at one point in time, this could change. When a newer version of Ruby comes out, the current-release URI points to that. The first URI here represents the canonical address for that resource and should not change.

A final important thing to point out about URIs is that they don’t specify any naming convention. Rail imposes certain conventions through controllers and actions, and many developers think of this as part of REST. It isn’t. While it’s generally good practice to use naming conventions in your URIs, there’s nothing in the constraints of REST that limits URI style.

Representations

Representations are the sequence of bytes and metadata for a resource. In lay terms, a representation is the format that a resource can take. This could be HTML, text, XML, JSON, JPG, TIFF, GIF, or any custom format you can dream up. For Rails developers, this looks most familiar when looking at the actions in a RESTful controller. Take a comment show action from the social feed reader as an example:

image

From this controller action, you can see that each comment resource has three possible representations that can be requested. These are the relative URIs for the three representations of the comment resource:

/comments/2.html—Gets the HTML representation of the comment with ID 2.

/comments/2.xml—Gets the XML representation of the comment with ID 2. Note that Rails automatically calls to_xml on the comment object.

/comments/2.json—Gets the JSON representation of the comment with ID 2. Note that Rails automatically calls to_json on the comment object.

Rails uses a file system–style naming convention in the URI to specify the representation of the requested resource. This is the .html, .json, and .xml at the end of the URI. While these three URIs are different, it is entirely possible for all three to point to the same resource (as you saw earlier with the Ruby software release example). Other methods for specifying which representation you’re requesting are covered in the next section of this chapter.

Yet another resource detail that could be considered part of the representation is the language. From our example, a comment could be represented in English, Spanish, Japanese, Farsi, or any other language. Most of the time, this isn’t an issue because a comment is available in only a single language. Further, automatic translators lack the ability to reliably convert to other languages with any level of quality. However, HTTP provides a method for specifying languages as well.

HTTP and the Uniform Interface

One of the key constraints of REST is the uniform interface. The goal of the uniform interface is to make the interaction between components as simple to understand as possible. This means that components should use the same methods to communicate between each other. Resources and their addresses are part of this uniformity. The uniformity of component interaction is the fixed set of operations that can be performed on resources. HTTP provides a small set of methods to enable simpler APIs along with gateway and caching servers.

HTTP Methods

The HTTP methods are the verbs for the uniform interface. You use them to communicate your needs against resources. The methods in HTTP are GET, POST, PUT, DELETE, HEAD, OPTIONS, TRACE, and CONNECT. For the purposes of RESTful interfaces, we’ll pay attention primarily to the first four methods. OPTIONS and HEAD come into play in special cases, while TRACE and CONNECT don’t concern our services designs. Before we dig into the specifics of each method, it’s worth mentioning two concepts that are important to HTTP and services design: safe and idempotent methods.

Safe and Idempotent Methods

Safe methods are those that do not request any server-side effects. That is, they are methods that only request information. They do not modify resources in any way. GET and HEAD methods are considered safe. This means that programs such as web crawlers can perform GET and HEAD operations without concern for what may happen on the server. An example is following hyperlinks on a page. Because POST is not considered a safe operation, well-behaved crawlers do not submit HTML forms.

Idempotent methods are those that can be replayed any number of times without a change from the first one. In more mathematical terms, these are methods for which the result of running them N > 0 times will always be the same. GET, HEAD, PUT, and DELETE are idempotent. GET and HEAD are idempotent because they have no side effects. PUT is idempotent because putting the same resource at the same URI again results in no change from the first time the PUT was run. DELETE is idempotent because once a resource is deleted, deleting it again doesn’t have any effect.

GET

The GET method is a read request for a specific resource. It retrieves information, which is identified by the URI. This could be a request that simply returns a static asset from disk (such as an image or a regular HTML file), or it could be a request that requires processing to return the result (for example, a search results page). So the GET method does literally what it says: It gets a resource. Further, it should produce no side effects. So it’s safe and idempotent.

POST

At its most basic level, a POST is appended to an existing resource. It is a request that includes an entity (some data) that the server should accept. It can be used to annotate an existing resource (such as adding a comment to a list of comments), appending data, or simply providing data to some backend process.

Rails applications commonly use POST as a create or insert for a record. This is very much like an append to a collection. Consider comments from the social feed reader application, for example. A POST to /comments would create a new comment. Thus, the request can be viewed as an append to the collection of comments. Further, it assigns an ID and creates a whole new resource at /comments/:id.

One final thing about POST is that it is neither safe nor idempotent. Running POST multiple times results in multiple resources being created. In some cases, you might want to take more care in creating a resource on a POST. This could be a way to hack around the fact that POST is not idempotent. In the earlier comments example, before doing an insert, you could check the database first to see if the comment body is the same as the last comment posted by the user. Of course, that’s beyond the realm of what HTTP cares about. It doesn’t specify exactly what the server-side behavior should be.

PUT

The PUT method places a resource at a specific URI. The request includes data and specifies that the server should store that data under the request URI. If there is already a resource at that URI, the data should be viewed as a modified version of the preexisting resource. Otherwise, that URI should become a reference to the data supplied in the request.

In simpler terms, this means that PUT can be either an insert with a client-supplied URI or an update. For most Rails programs, PUT isn’t used for inserts at all. This is because the server is usually responsible for supplying the full URI for created resources. For creating a comment in the social feed reader, it would look like this:

POST to /comments—Here the body of the POST would supply the information for the comment. The server would then do an insert in a database to create the resource.

PUT to /comments/2—With Rails, this would be an update to the comment with an ID of 2. In a standard Rails application, this would never indicate an insert. However, HTTP specifies that this could indeed be an insert.

Some HTTP purists will tell you that a PUT should always be an insert or a complete replace of the resource at the specified URI. This means that you wouldn’t be able to do partial updates as you commonly do in Rails. Instead, you’d have to do something like this:

PUT to /comments/2/body

In this case, the body of the request would be the replacement of the body of the comment. So to update any two attributes, you would have to make multiple requests to the specific attribute resources. Obviously, doing things this way would be completely maddening. Indeed, Roy Fielding has stated that PUT doesn’t adhere to a strict definition of “store at this URI.” Here’s what he had to say on the subject on a mailing list in 2006:

FWIW, PUT does not mean store. I must have repeated that a million times in webdav and related lists. HTTP defines the intended semantics of the communication—the expectations of each party. The protocol does not define how either side fulfills those expectations, and it makes damn sure it doesn’t prevent a server from having absolute authority over its own resources. Also, resources are known to change over time, so if a server accepts an invalid Atom entry via PUT one second and then immediately thereafter decides to change it to a valid entry for later GETs, life is grand.2

So the next time a REST or HTTP zealot tells you that partial updates aren’t allowed with the formal definition of PUT, you can point the person to Roy’s own words.

We still need to make sure that the server-side results of a PUT request remain idempotent. If a partial update happens with a PUT, the result of running that multiple times should not be any different from running it once.

DELETE

The DELETE method is fairly simple and straightforward. It deletes the resource at a specified URI. However, the operation can be overridden on the server.

HEAD and OPTIONS

The HEAD and OPTIONS methods are the final two methods worth mentioning for the RESTful services design. These two lesser-used methods provide information about resources.

HEAD works exactly like GET, with one difference. It doesn’t return the actual resource. Instead, it returns only the header information for a resource. This includes things like content-size, last-modified, content-type, and ETag headers. This can be useful when you’re determining whether a cached resource should be considered stale.

OPTIONS is a method for getting information about what operations can be performed on a resource. As an example, if you wanted the comments API to conform more closely to HTTP conventions, an OPTIONS request on /comments/2 might return the following header:

   Allow: GET, HEAD, PUT, DELETE

The response contained in the header tells you that you can perform any on of those HTTP methods on the specific comments resource.

HTTP Headers

HTTP headers work on both sides of client/server communication. Clients issue request headers, and the server issues response headers. Together, these help the two parties agree on and convey information about resources.

Request Headers

Request headers let the client specify additional information to the server about the requested representation of a resource. This list is by no means complete, but here are some common request headers, with examples that a client may specify:

Accept: text/plain, text/html—This tells the server that the client is looking for a plain-text or HTML representation of the resource. Other options could be application/json, text/javascript, application/xml, text/xml, image/jpeg, and countless others.

Accept-Encoding: compress, gzip—This header tells the server whether the client can take a gzip or compressed response.

User-Agent: pauldix-service-client/1.0—In most cases, the user agent field isn’t required. However, some servers reject requests that don’t specify a user agent of some kind. It’s generally part of being a good Internet citizen to specify a user agent that server administrators can find information on.

Accept-Language: en—This header tells the server what language the resource should be returned in. This example requests English.

If-Modified-Since: Tue, 30 Jun 2009 08:30:31 GMT—This header tells the server that the client has a version of the resource that was last generated at 8:30 AM on June 30. If the server has a newer version, it should return that in the response. If that version is the latest, then the server should return an empty response body with a 304 Not Modified response code. This is one piece of the caching mechanisms that is built into HTTP.

If-None-Match: "sdfzlkjsd"—This header is another way for the client to tell the server that it has a cached version of a resource. The supplied string matches up with an ETag in the response header. If the client-supplied ETag matches that of the most current version of the requested resource, the server should return an empty response body with a 304 Not Modified response code. Otherwise, the server should return the latest version of the requested resource.

Content-Type: application/x-www-form-urlencoded—The content type field is used in POST and PUT requests to specify to the server the format of the request body. The example here is for a form POST. Two other common examples are application/json and application/xml.

Content-Length: 1274—This tells the server the length of the request body, in bytes.

Cookie: user_id=PaulDix; sort=date—Cookies are set in the request header. This is a list of name/value pairs separated by semicolons.

Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ==—This header is for specifying client credentials for a request. This example uses basic authentication and passes the user name and password as a base 64–encoded string.

Complete List of Request Headers

A more complete list of request headers can be found at the W3 definition site, at http://www.w3.org/Protocols/HTTP/HTRQ_Headers.html.

Representation Headers

RESTful purists argue that the representation of a resource should be specified by the client in the request headers. This includes the format (JSON, XML, HTML, and so on) and the language. Obviously, Rails doesn’t follow this exactly. You specify to Rails the format you want with the end part of the URI. For example, /comments/2.json tells Rails that you want the JSON-formatted representation of the comment with ID 2. When it comes to languages, Rails leaves application developers on their own.

There are three arguments against taking the HTTP purist’s approach of specifying representation in headers. First, it’s easier for most client-side developers to deal with specifying the format in the URI rather than in the header. Seeing a .json or .xml extension on the end of the URI is understandable almost immediately. While using the URI to specify format may not be exactly technically correct, it often results in more accessible APIs.

Second, some cache servers, such as those at Akamai, cache only based on the URI. So if you specify which representation you want in the request header, your caching layer may end up broken. This is a definite reason to keep this information in the URI.

Finally, language representations may not map to exactly the same resource. When using the request headers to specify representation, you are always asking for the same resource each time, but with different options. With formatting such as XML and JSON, this is a direct machine representation of the same thing. With languages, machine translation is rarely the best way to go from one language to another. It’s often better if you have human translators create the different language versions. In that case, the translated versions represent distinct resources and should be specified with different URIs.

Response Headers

Response headers from the server give the client metadata about the resource being returned. Often, the response headers have matching corollaries in the request headers. Here is a list of response headers that you will most often be concerned with:

Age: 30—This gives the time the resource has been in proxy cache, in seconds.

Allow: GET, HEAD, PUT, DELETE—This lists the HTTP methods allowed on the resource. You saw this earlier, in the response to an OPTIONS request.

Cache-Control: max-age=120—This header allows either the client or server more control over caching than the default mechanisms. These commands are intended for proxy caches that lie between the client and server. Chapter 8, “Load Balancing and Caching,” covers this in greater detail.

Content-Encoding: gzip—This tells the client the encoding of the response. It is used for compression and gzip.

Content-Length: 234—This gives the length of the response body, in bytes.

Content-Type: application/json; charset=utf-8—This tells the client the MIME type of the response. This should match up with one of the types in the Accept header from the request.

ETag: sdlkf234—The ETag gives a unique identifier of some sort for the resource. This identifier should match up with the version of the resource in some way. This could be an ID combined with a last_updated field or a hash of the 2 or some other value. This is used for caching purposes and matches with the If-None-Match request header.

Expires: Wed, 01 Jul 2009 09:02:12 GMT—This tells the client or a proxy cache when a response should be considered stale. Either of these can verify whether the resource is up-to-date for a later request.

Last-Modified: Wed, 01 Jul 2009 09:03:45 GMT—This is another caching mechanism. This matches up with the client-side header If-Modified-Since. Thus, with later requests for the same resource, the client can use the Last-Modified value for If-Modified-Since.

Set-Cookie: user_id=paul—This sets the client-side cookie.

HTTP Status Codes

Status codes are three-digit integers that specify the status of a response. With regard to our RESTful services, status codes should be used as part of the API and the uniform interface. While REST doesn’t state a specific need for the use of status codes, it is generally considered good design practice to conform to the definitions of HTTP.

The first digit of the status code indicates the general nature of the response. Here are the classes of responses from the W3 definition:

• 1xx—Informational status which states that the request has been received and is continuing

• 2xx—Success status which states that the request was received and accepted

• 3xx—Redirection status which states that to complete the request, further action must be taken at another URI

• 4xx—Client error status which states that the request is either improperly formatted or cannot be completed for some reason

• 5xx—Server error status which states that the server failed to complete the request despite it being valid

One thing to note about status codes is that the HTTP methods often define which specific code should be taken on certain actions.

Conclusion

REST and HTTP provide basic guidelines for building services. While the exact definition of REST is debatable, we can focus on the set of constraints that make for a RESTful architecture. Four constraints often resurface while you’re designing services:

Client/server—REST is a client/server architecture.

Stateless—The server contains no state between requests.

Cache—Responses are labeled as cacheable or not cacheable.

Uniform interface—Interactions between a client and a server should conform to a uniform interface.

At the center of RESTful design lies the concept of resources. Resources are the objects that requests can be performed against. They are addressable through URIs and have representations (format and language).

The HTTP verbs are the building blocks of the uniform interface. They can be roughly mapped to database actions and CRUD. Here they are again:

GET—Maps to SQL SELECT.

PUT—Maps to SQL UPDATE or INSERT with a specified key.

POST—Maps to SQL INSERT. The key is generated by the database.

DELETE—Maps to SQL DELETE.

Ultimately, the important thing to remember about REST is that it is merely a set of constraints and guidelines for designing and architecting scalable client/server systems. Many developers like to get embroiled in arguments about what represents REST and what doesn’t, but often the distinctions aren’t that strong. The best approach usually involves a number of trade-offs and compromises.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.136.233.153