Setting up news feeds and Google Sheets through IFTTT

Hopefully, you have an IFTTT account set up at this point, but if not, go ahead and set that up now. Once that is done, you'll need to set up integration with feeds and with Google Sheets:

  1. First, search for feeds in the search box on the home page, then click on Services, and click to set that up:

  1. You'll just need to click Connect:

  1. Next, search for Google Drive under Services:

  1. Click on that. It should take you to a page where you select the Google account you want to connect to. Choose the account and then click Allow to enable IFTTT to access your Google Drive account. Once that's done, you should see the following:

  1. Now, with our channels connected, we can set up our feed. Click on New Applet in the dropdown under your username in the right-hand corner. This will bring you here:

  1. Click on +this. Search for RSS Feed, and then click on that. That should bring you here:

  1. From here, click on New feed item:

  1. Then, add the URL to the box and click Create trigger. Once that is done, you'll be brought back to add the +that action:

  1. Click on +that, search for Sheets, and then click on its icon. Once that is done, you'll find yourself here:

  1. We want our news items to flow into a Google Drive spreadsheet, so click on Add row to spreadsheet. You'll then have an opportunity to customize the spreadsheet:

I gave the spreadsheet the name NewStories, and placed it in a Google Drive folder called IFTTT. Click Create Action to finish the recipe, and soon you'll start seeing news items flow into your Google Drive spreadsheet. Note that it will only add new items as they come in, not items that existed at the time you created the sheet. I recommend adding a number of feeds. You will need to create individual recipes for each. It is best if you add feeds for the sites that are in your training set, in other words, the ones you saved with Pocket.

Give those stories a day or two to build up in the sheet, and then it should look something like this:

Fortunately, the full article HTML body is included. This means we won't have to use Embedly to download it for each article. We will still need to download the articles from Google Sheets, and then process the text to strip out the HTML tags, but this can all be done rather easily.

To pull down the articles, we'll use a Python library called gspread. This can be pip installed. Once that is installed, you need to follow the direction for setting up OAuth 2. That can be found at http://gspread.readthedocs.org/en/latest/oauth2.html. You will end up downloading a JSON credentials file. It is critical that, once you have that file, you find the email address in it with the client_email key. You then need to share the NewStories spreadsheet you are sending the stories to with that email. Just click on the blue Share button in the upper-right corner of the sheet, and paste the email in there. You will end up receiving a failed to send message in your Gmail account, but that is expected. Make sure to swap in your path to the file and the name of the file in the following code:

import gspread 
 
from oauth2client.service_account import ServiceAccountCredentials 
JSON_API_KEY = 'the/path/to/your/json_api_key/here' 
 
scope = ['https://spreadsheets.google.com/feeds', 
         'https://www.googleapis.com/auth/drive'] 
 
credentials = ServiceAccountCredentials.from_json_keyfile_name(JSON_API_KEY, scope) 
gc = gspread.authorize(credentials) 

Now, if everything went well, it should run without errors. Next, you can download the stories:

ws = gc.open("NewStories") 
sh = ws.sheet1 
 
zd = list(zip(sh.col_values(2),sh.col_values(3), sh.col_values(4))) 
 
zf = pd.DataFrame(zd, columns=['title','urls','html']) 
zf.replace('', pd.np.nan, inplace=True) 
zf.dropna(inplace=True) 
 
zf 

The preceding code results in the following output:

With that, we downloaded all of the articles from our feed and placed them into a DataFrame. We now need to strip out the HTML tags. We can use the function we used earlier to retrieve the text. We'll then transform it using our tf-idf vectorizer:

zf.loc[:,'text'] = zf['html'].map(get_text) 
 
zf.reset_index(drop=True, inplace=True) 
 
test_matrix = vect.transform(zf['text']) 
 
test_matrix 

The preceding code results in the following output:

Here, we see that our vectorization was successful. Let's now pass it into our model to get back the results:

results = pd.DataFrame(model.predict(test_matrix), columns=['wanted']) 
 
results 

The preceding code results in the following output:

We see here that we have results for each of the stories. Let's now join them with the stories themselves so that we can evaluate the results:

rez = pd.merge(results,zf, left_index=True, right_index=True) 
 
rez 

The preceding code results in the following output:

At this point, we can improve the model by going through the results and correcting the errors. You'll need to do this for yourself, but here is how I made changes to my own:

change_to_no = [130, 145, 148, 163, 178, 199, 219, 222, 223, 226, 235, 279, 348, 357, 427, 440, 542, 544, 546, 568, 614, 619, 660, 668, 679, 686, 740, 829] 
 
change_to_yes = [0, 9, 29, 35, 42, 71, 110, 190, 319, 335, 344, 371, 385, 399, 408, 409, 422, 472, 520, 534, 672] 
 
for i in rez.iloc[change_to_yes].index: 
    rez.iloc[i]['wanted'] = 'y' 
 
for i in rez.iloc[change_to_no].index: 
    rez.iloc[i]['wanted'] = 'n' 
 
rez 

The preceding code results in the following output:

This may look like a lot of changes, but of the over 900 articles evaluated, I had to change very few. By making these corrections, we can now feed this back into our model to improve it even more. Let's add these results to our earlier training data and then rebuild the model:

combined = pd.concat([df[['wanted', 'text']], rez[['wanted', 'text']]]) 
 
combined 

The preceding code results in the following output:

Retrain the model with following code:

tvcomb = vect.fit_transform(combined['text'], combined['wanted']) 
 
model = clf.fit(tvcomb, combined['wanted']) 

Now we have retrained our model with all the available data. You may want to do this a number of times as you get more results over the days and weeks. The more you add, the better your results will be.

We'll assume you have a well-trained model at this point, and are ready to begin using it. Let's now see how we can deploy this to set up a personalized news feed.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.139.97.202