Harvesting additional results from the Google+ API using pagination

By default, the Google+ APIs return a maximum of 25 results, but we can extend the previous scripts by increasing the maximum value and harvesting more results through pagination. As before, we will communicate with the Google+ API through a URL and the urllib library. We will create arbitrary numbers that will increase as requests go ahead, so we can move across pages and gather more results.

How to do it

The following script shows how you can harvest additional results from the Google+ API:

import urllib2
import json

GOOGLE_API_KEY = "{Insert your Google API key}"
target = "packtpub.com"
token = ""
loops = 0

while loops < 10:
  api_response = urllib2.urlopen("https://www.googleapis.com/plus/v1/people? query="+target+"&key="+GOOGLE_API_KEY+"&maxResults=50& pageToken="+token).read()

  json_response = json.loads(api_response)
  token = json_response['nextPageToken']

  if len(json_response['items']) == 0:
    break

  for result in json_response['items']:
        name = result['displayName']
        print name
        image = result['image']['url'].split('?')[0]
    f = open(name+'.jpg','wb+')
    f.write(urllib2.urlopen(image).read())
  loops+=1

How it works

The first big change in this script that is the main code has been moved into a while loop:

token = ""
loops = 0

while loops < 10:

Here, the number of loops is set to a maximum of 10 to avoid sending too many requests to the API servers. This value can of course be changed to any positive integer. The next change is to the request URL itself; it now contains two additional trailing parameters maxResults and pageToken. Each response from the Google+ API contains a pageToken value, which is a pointer to the next set of results. Note that if there are no more results, a pageToken value is still returned. The maxResults parameter is self-explanatory, but can only be increased to a maximum of 50:

  api_response = urllib2.urlopen("https://www.googleapis.com/plus/v1/people? query="+target+"&key="+GOOGLE_API_KEY+"&maxResults=50& pageToken="+token).read()

The next part reads the same as before in the JSON response, but this time it also extracts the nextPageToken value:

  json_response = json.loads(api_response)
  token = json_response['nextPageToken']

The main while loop can stop if the loops variable increases up to 10, but sometimes you may only get one page of results. The next part in the code checks to see how many results were returned; if there were none, it exits the loop prematurely:

  if len(json_response['items']) == 0:
    break

Finally, we ensure that we increase the value of the loops integer each time. A common coding mistake is to leave this out, meaning the loop will continue forever:

  loops+=1
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.219.249.210