14 Accessing Web APIs

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

14
Accessing Web APIs

Previous chapters have described how to access data from local .csv files, as well as from local databases. While working with local data is common for many analyses, more complex shared data systems leverage web services for data access. Rather than store data on each analyst’s computer, data is stored on a remote server (i.e., a central computer somewhere on the internet) and accessed similarly to how you access information on the web (via a URL). This allows scripts to always work with the latest data available when performing analysis of data that may be changing rapidly, such as social media data.

In this chapter, you will learn how to use R to programmatically interact with data stored by web services. From an R script, you can read, write, and delete data stored by these services (though this book focuses on the skill of reading data). Web services may make their data accessible to computer programs like R scripts by offering an application programming interface (API). A web service’s API specifies where and how particular data may be accessed, and many web services follow a particular style known as REpresentational State Transfer (REST).¹ This chapter covers how to access and work with data from these RESTful APIs.

¹Fielding, R. T. (2000). Architectural styles and the design of network-based software architectures. University of California, Irvine, doctoral dissertation. https://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm. Note that this is the original specification and is very technical.

14.1 What Is a Web API?

An interface is the point at which two different systems meet and communicate, exchanging information and instructions. An application programming interface (API) thus represents a way of communicating with a computer application by writing a computer program (a set of formal instructions understandable by a machine). APIs commonly take the form of functions that can be called to give instructions to programs. For example, the set of functions provided by a package like dplyr make up the API for that package.

While some APIs provide an interface for leveraging some functionality, other APIs provide an interface for accessing data. One of the most common sources of these data APIs are web services—that is, websites that offer an interface for accessing their data.

With web services, the interface (the set of “functions” you can call to access the data) takes the form of HTTP requests—that is, requests for data sent following the HyperText Transfer Protocol.

This is the same protocol (way of communicating) used by your browser to view a webpage! An HTTP request represents a message that your computer sends to a web server: another computer on the internet that “serves,” or provides, information. That server, upon receiving the request, will determine what data to include in the response it sends back to the requesting computer. With a web browser, the response data takes the form of HTML files that the browser can render as webpages. With data APIs, the response data will be structured data that you can convert into R data types such as lists or data frames.

In short, loading data from a web API involves sending an HTTP request to a server for a particular piece of data, and then receiving and parsing the response to that request.

Learning how to use web APIs will greatly expand the available data sets you may want to use for analysis. Companies and services with large amounts of data, such as Twitter,² iTunes,³ or Reddit,⁴ make (some of) their data publicly accessible through an API. This chapter will use the GitHub API⁵ to demonstrate how to work with data stored in a web service.

²Twitter API: https://developer.twitter.com/en/docs

³iTunes search API: https://affiliate.itunes.apple.com/resources/documentation/itunes-store-web-service-search-api/

⁴Reddit API: https://www.reddit.com/dev/api/

⁵GitHub API: https://developer.github.com/v3/

14.2 RESTful Requests

There are two parts to a request sent to a web API: the name of the resource (data) that you wish to access, and a verb indicating what you want to do with that resource. In a way, the verb is the function you want to call on the API, and the resource is an argument to that function.

14.2.1 URIs

Which resource you want to access is specified with a Uniform Resource Identifier (URI).⁶ A URI is a generalization of a URL (Uniform Resource Locator)—what you commonly think of as a “web address.” A URI acts a lot like the address on a postal letter sent within a large organization such as a university: you indicate the business address as well as the department and the person to receive the letter, and will get a different response (and different data) from Alice in Accounting than from Sally in Sales.

⁶Uniform Resource Identifier (URI) Generic Syntax (official technical specification): https://tools.ietf.org/html/rfc3986

Like postal letter addresses, URIs have a very specific format used to direct the request to the right resource, illustrated in Figure 14.1.

A figure shows the schema of a URI with an example. — Figure 14.1 The format (schema) of a URI.

The URI format reads "https://domain.com:9999/example/page/type=husky&name=dubs#nose." The parts of the URI are labeled as follows: https is scheme, domain.com is domain, 9999 is port, example/path is path, type=husky&name is query, dubs#nose is fragment.

Not all parts of the URI are required. For example, you don’t necessarily need a port, query, or fragment. Important parts of the URI include:

scheme (protocol): The “language” that the computer will use to communicate the request to the API. With web services this is normally https (secure HTTP).
domain: The address of the web server to request information from.
path: The identifier of the resource on that web server you wish to access. This may be the name of a file with an extension if you’re trying to access a particular file, but with web services it often just looks like a folder path!
query: Extra parameters (arguments) with further details about the resource to access.

The domain and path usually specify the location of the resource of interest. For example, www.domain.com/users might be an identifier for a resource that serves information about all the users. Web services can also have “subresources” that you can access by adding extra pieces to the path. For example, www.domain.com/users/layla might access to the specific resource (“layla”) that you are interested in.

With web APIs, the URI is often viewed as being broken up into three parts, as shown in Figure 14.2:

The base URI is the domain that is included on all resources. It acts as the “root” for any particular endpoint. For example, the GitHub API has a base URI of https://api.github.com. All requests to the GitHub API will have that base.
An endpoint is the location that holds the specific information you want to access. Each API will have many different endpoints at which you can access specific data resources. The GitHub API, for example, has different endpoints for /users and /orgs so that you can access data about users or organizations, respectively.

Note that many endpoints support accessing multiple subresources. For example, you can access information about a specific user at the endpoint /users/:username. The colon : indicates that the subresource name is a variable—you can replace that part of the endpoint with whatever string you want. Thus if you were interested in the GitHub user nbremer,⁷ you would access the /users/nbremer endpoint.

⁷Nadieh Bremer, freelance data visualization designer: https://www.visualcinnamon.com

Subresources may have further subresources (which may or may not have variable names). The endpoint /orgs/:org/repos refers to the list of repositories belonging to an organization. Variable names in endpoints might alternatively be written inside of curly braces {}—for example, /orgs/{org}/repos. Neither the colon nor the braces are programming language syntax; instead, they are common conventions used to communicate how to specify endpoints.
Query parameters allow you to specify additional information about which exact information you want from the endpoint, or how you want it to be organized (see Section 14.2.1.1 for more details).

Figure 14.2 The anatomy of a web API request URI.

The URL reads "https://api.github.com/search/repositories/q=dplyr&sort=forks." The breakdown of the URI is as follows: https://api.github.com is base URI, search/repositories is endpoint, and q=dplyr&sort=forks is query.

Remember

One of the biggest challenges in accessing a web API is understanding what resources (data) the web service makes available and which endpoints (URIs) can request those resources. Read the web service’s documentation carefully—popular services often include examples of URIs and the data returned from them.

A query is constructed by appending the endpoint and any query parameters to the base URI. For example, so you could access a GitHub user by combining the base URI (https://api.github.com) and endpoint (/users/nbremer) into a single string: https://api.github.com/users/nbremer. Sending a request to that URI will return data about the user—you can send this request from an R program or by visiting that URI in a web browser, as shown in Figure 14.3. In short, you can access a particular data resource by sending a request to a particular endpoint.

A screenshot displays the GitHub API response. It displays details such as login, id, node_id, avatar_url, html_url, followers_url, and so on. — Figure 14.3 GitHub API response returned by the URI `https://api.github.com/users/nbremer`, as displayed in a web browser.

Indeed, one of the easiest ways to make a request to a web API is by navigating to the URI using your web browser. Viewing the information in your browser is a great way to explore the resulting data, and make sure you are requesting information from the proper URI (i.e., that you haven’t made a typo in the URI).

Tip

The JSON format (see Section 14.4) of data returned from web APIs can be quite messy when viewed in a web browser. Installing a browser extension such as JSONView^a will format the data in a somewhat more readable way. Figure 14.3 shows data formatted with this extension.

^a https://chrome.google.com/webstore/detail/jsonview/chklaanhfefbnpoihckbnefhakgolnmc

14.2.1.1 Query Parameters

Web URIs can optionally include query parameters, which are used to request a more specific subset of data. You can think of them as additional optional arguments that are given to the request function—for example, a keyword to search for or criteria to order results by.

The query parameters are listed at the end of a URI, following a question mark (?) and are formed as key–value pairs similar to how you named items in lists. The key (parameter name) is listed first, followed by an equals sign (=), followed by the value (parameter value), with no spaces between anything. You can include multiple query parameters by putting an ampersand (&) between each key–value pair. You can see an example of this syntax by looking at the URL bar in a web browser when you use a search engine such as Google or Yahoo, as shown in Figure 14.4. Search engines produce URLs with a lot of query parameters, not all of which are obvious or understandable.

A screenshot shows the search engine URLs of google and yahoo. — Figure 14.4 Search engine URLs for Google (top) and Yahoo (bottom) with query parameters (under-lined in blue). The “search term” parameter for each web service is underlined in red.

The URL obtained after searching for Informatics in Google is "http://www.google.com/search?source=hp&q=informatics&oq=informatics&gs_i=psy-ab.1.0/10...." and the URL for the same search made in Yahoo is "http://search.yahoo.com/search?p=informatics&fr=yfp-t&fp=1&toggle=1&cop=mss&ei=UTF-8."

Notice that the exact query parameter name used differs depending on the web service. Google uses a q parameter (likely for “query”) to store the search term, while Yahoo uses a p parameter.

Similar to arguments for functions, API endpoints may either require query parameters (e.g., you must provide a search term) or optionally allow them (e.g., you may provide a sorting order). For example, the GitHub API has a /search/repositories endpoint that allows users to search for a specific repository: you are required to provide a q parameter for the query, and can optionally provide a sort parameter for how to sort the results:

Table of Contents for 14 Accessing Web APIs

Create new playlist

Sign In

Sign Up

14Accessing Web APIs

14.1 What Is a Web API?

14.2 RESTful Requests

14.2.1 URIs

14.2.1.1 Query Parameters

14.2.1.2 Access Tokens and API Keys

14.2.2 HTTP Verbs

14.3 Accessing Web APIs from R

14.4 Processing JSON Data

14.4.1 Parsing JSON

14.4.2 Flattening Data

14.5 APIs in Action: Finding Cuban Food in Seattle

Table of Contents for
14 Accessing Web APIs

14
Accessing Web APIs

14.3 Accessing Web APIs from `R`