Setting up news feeds and Google Sheets through IFTTT

Hopefully, you have an IFTTT account set up at this point, but if not, go ahead and set that up now. Once that is done, you'll need to set up integration with feeds and with Google Sheets:

First, search for feeds in the search box on the home page, then click on Services, and click to set that up:

You'll just need to click Connect:

Next, search for Google Drive under Services:

Click on that. It should take you to a page where you select the Google account you want to connect to. Choose the account and then click Allow to enable IFTTT to access your Google Drive account. Once that's done, you should see the following:

Now, with our channels connected, we can set up our feed. Click on New Applet in the dropdown under your username in the right-hand corner. This will bring you here:

Click on +this. Search for RSS Feed, and then click on that. That should bring you here:

From here, click on New feed item:

Then, add the URL to the box and click Create trigger. Once that is done, you'll be brought back to add the +that action:

Click on +that, search for Sheets, and then click on its icon. Once that is done, you'll find yourself here:

We want our news items to flow into a Google Drive spreadsheet, so click on Add row to spreadsheet. You'll then have an opportunity to customize the spreadsheet:

I gave the spreadsheet the name NewStories, and placed it in a Google Drive folder called IFTTT. Click Create Action to finish the recipe, and soon you'll start seeing news items flow into your Google Drive spreadsheet. Note that it will only add new items as they come in, not items that existed at the time you created the sheet. I recommend adding a number of feeds. You will need to create individual recipes for each. It is best if you add feeds for the sites that are in your training set, in other words, the ones you saved with Pocket.

Give those stories a day or two to build up in the sheet, and then it should look something like this:

Fortunately, the full article HTML body is included. This means we won't have to use Embedly to download it for each article. We will still need to download the articles from Google Sheets, and then process the text to strip out the HTML tags, but this can all be done rather easily.

To pull down the articles, we'll use a Python library called gspread. This can be pip installed. Once that is installed, you need to follow the direction for setting up OAuth 2. That can be found at http://gspread.readthedocs.org/en/latest/oauth2.html. You will end up downloading a JSON credentials file. It is critical that, once you have that file, you find the email address in it with the client_email key. You then need to share the NewStories spreadsheet you are sending the stories to with that email. Just click on the blue Share button in the upper-right corner of the sheet, and paste the email in there. You will end up receiving a failed to send message in your Gmail account, but that is expected. Make sure to swap in your path to the file and the name of the file in the following code:

import gspread 
 
from oauth2client.service_account import ServiceAccountCredentials 
JSON_API_KEY = 'the/path/to/your/json_api_key/here' 
 
scope = ['https://spreadsheets.google.com/feeds', 
         'https://www.googleapis.com/auth/drive'] 
 
credentials = ServiceAccountCredentials.from_json_keyfile_name(JSON_API_KEY, scope) 
gc = gspread.authorize(credentials)

Now, if everything went well, it should run without errors. Next, you can download the stories:

ws = gc.open("NewStories") 
sh = ws.sheet1 
 
zd = list(zip(sh.col_values(2),sh.col_values(3), sh.col_values(4))) 
 
zf = pd.DataFrame(zd, columns=['title','urls','html']) 
zf.replace('', pd.np.nan, inplace=True) 
zf.dropna(inplace=True) 
 
zf