Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Previous Chapter

Installing the Pocket Chrome Extension

Using the Embedly API to download story bodies

Using the Pocket API to retrieve stories

Now that you've diligently saved your articles to Pocket, the next step is to retrieve them. To accomplish this, we'll use the Pocket API. You can sign up for an account at https://getpocket.com/developer/apps/new. Follow the steps to achieve that:

Click on Create a New App in the upper-left corner and fill in the details to get your API key.

Make sure to click all of the permissions so that you can add, change, and retrieve articles:

Once you have that filled in and submitted, you will receive your consumer key.
You can find that in the upper-left corner, under My Apps. It will look like the following screenshot, but obviously with a real key:

Once that is set, you are ready to move on to the next step, which is to set up authorizations. We'll do that now.

It requires you to input your consumer key and a redirect URL. The redirect URL can be anything. Here, I have used my Twitter account:

import requests 
import pandas as pd 
import json 
pd.set_option('display.max_colwidth', 200) 
 
CONSUMER_KEY = 'enter_your_consumer_key_here 
 
auth_params = {'consumer_key': CONSUMER_KEY, 'redirect_uri': 'https://www.twitter.com/acombs'} 
 
tkn = requests.post('https://getpocket.com/v3/oauth/request', data=auth_params) 
 
tkn.text

The preceding code results in the following output:

The output will have the code you'll need for the next step. Place the following in your browser bar:

https://getpocket.com/auth/authorize?request_token=some_long_access_code&redirect_uri=https%3A//www.twitter.com/acombs

If you change the redirect URL to one of your own, make sure to URL encode it (that's the %3A type stuff you see in the preceding URL).
At this point, you should be presented with an authorization screen. Go ahead and approve it, and then we can move on to the next step:

# below we parse out the access code from the tkn.text string 
ACCESS_CODE = tkn.text.split('=')[1] 
 
usr_params = {'consumer_key': CONSUMER_KEY, 'code': ACCESS_CODE} 
 
usr = requests.post('https://getpocket.com/v3/oauth/authorize', data=usr_params) 
 
usr.text

The preceding code results in the following output:

We'll use the output code here, to move on to retrieving the stories. First, we retrieve the stories tagged n:

# below we parse out the access token from the usr.text string 
ACCESS_TOKEN = usr.text.split('=')[1].split('&amp;')[0] 
 
no_params = {'consumer_key': CONSUMER_KEY, 
'access_token': ACCESS_TOKEN, 
'tag': 'n'} 
 
no_result = requests.post('https://getpocket.com/v3/get', data=no_params) 
 
no_result.text

The preceding code results in the following output:

You'll notice that we have a long JSON string on all the articles that we tagged n. There are several keys in this, but we are really only interested in the URL at this point.

We'll go ahead and create a list of all the URLs from this:

no_jf = json.loads(no_result.text) 
no_jd = no_jf['list'] 
 
no_urls=[] 
for i in no_jd.values(): 
    no_urls.append(i.get('resolved_url')) 
 
no_urls

The preceding code results in the following output:

List of URLs

This list contains all the URLs of stories we aren't interested in. Let's now put that in a DataFrame and tag it as such:

no_uf = pd.DataFrame(no_urls, columns=['urls']) 
no_uf = no_uf.assign(wanted = lambda x: 'n') 
 
no_uf

The preceding code results in the following output:

Tagging the URLs

Now we're all set with the unwanted stories. Let's do the same thing with those stories we are interested in:

yes_params = {'consumer_key': CONSUMER_KEY, 
'access_token': ACCESS_TOKEN, 
'tag': 'y'} 
yes_result = requests.post('https://getpocket.com/v3/get', data=yes_params) 
 
yes_jf = json.loads(yes_result.text) 
yes_jd = yes_jf['list'] 
 
yes_urls=[] 
for i in yes_jd.values(): 
    yes_urls.append(i.get('resolved_url')) 
 
yes_uf = pd.DataFrame(yes_urls, columns=['urls']) 
yes_uf = yes_uf.assign(wanted = lambda x: 'y') 
 
yes_uf

The preceding code results in the following output:

Tagging the URLs of stories we are interested in

Now that we have both types of stories for our training data, let's join them together into a single DataFrame:

df = pd.concat([yes_uf, no_uf]) 
 
df.dropna(inplace=True) 
 
df

The preceding code results in the following output:

Joining the URLs- both interested and not interested

Now that we're set with all our URLs and their corresponding tags in a single frame, we'll move on to downloading the HTML for each article. We'll use another free service for this, called Embedly.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

3.141.192.120