Case 3 – demonstrating RESTful API cache functionality

In this section, we will be using the following URLhttps://api.github.com/.

GitHUb (https://github.com/) is a place for developers and their code repositories. The GitHub API is quite famous among developers, all of which come from various programming backgrounds. As we can see in the following screenshot, the response is obtained in JSON. The request was a success since the HTTP status code that was returned was 200, that is, OK or Success:

Response from https://api.github.com with HTTP Status Code 200

As you can see, we made a basic call to https://api.github.com. The content that was returned contains links for the API, along with some parameters to be supplied for specific calls, such as {/gist_id}, {/target}, and {query}

Let's send a request to the API again, but this time without any changes or updates in the parameter values. The content that we will receive will be similar to the previous response, but there will be a difference in the HTTP Status Code; that is, we will get 304 Not Modified in comparison to 200 OK:

 

HTTP Status code 304 for https://api.github.com

This HTTP status code (304 or Not Modified) demonstrates REST's caching functionality. Since the response doesn't have any updates or updated content, the client-side caching functionality comes into play. This helps with processing time, as well as bandwidth time and usage. The cache is one of the important properties of RESTful web services. The following is the Python code revealing the cache property of the RESTful API, which was obtained by passing external headers that were supplied to the headers parameter while making a request with requests.get():

import requests
url = 'https://api.github.com'

#First Request
results = requests.get(url)
print("Status Code: ", results.status_code)
print("Headers: ", results.headers)

#Second Request with 'headers'
etag = results.headers['ETag']
print("ETag: ",etag)

results = requests.get(url, headers={'If-None-Match': etag})
print("Status Code: ", results.status_code)

requests is used to call url twice in the code. We can also see that the second request has been supplied with etag for header information, that is, If-None-Match. This particular header checks for the response header that was obtained using the ETag key as an HTTP Response Header. ETag is used for tracking purposes and normally identifies the resources that exist. This exhibits the cache ability. For more information on ETag, please refer to https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/ETag.

ETag is collected from results.headers and forwarded with second request that was made by obtaining HTTP Status Code: 304. The following code shows the output:

Status Code: 200
Headers: Content-Type: application/json; charset=utf-8
Headers: {'X-GitHub-Request-Id': 'A195:073C:37F223:79CCB0:5C8144B4', 'Status': '200 OK','ETag': 'W/"7dc470913f1fe9bb6c7355b50a0737bc"', 'Content-Encoding': 'gzip','Date': 'Thu, 07 Mar 2019 16:20:05 GMT',........, 'Content-Type': 'application/json; charset=utf-8', ....., 'Server': 'GitHub.com'}

ETag: W/"7dc470913f1fe9bb6c7355b50a0737bc"
Status Code: 304

In this section, we have learned about various APIs, accessing them via the use of features, and demonstrated a number of important concepts that are relevant to web scraping methods. In the next section, we will be scraping data with the use of APIs.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.22.34