Creating a link

Now, remarkably, we have everything we need to begin our airline fare scraping, with one exception: we need the URL. For this exercise, I'm going to focus on flights leaving from NYC and flying to Europe. Since we don't want to pull down massive quantities of data and risk being blocked, we are going to just pull data for non-stop flights that depart on Saturday and return on the following Saturday. You are, of course, free to change this to whatever fares you'd like to target, but we'll use this for our sample project.

The next step is to fill out the form in Google Flights. Make sure to choose a future date. Once you have input your data and hit Search, copy the URL string from your browser bar, as seen in the following screenshot:

The URL I copied is for flights that depart on 2018-12-01 and return on 2018-12-08. Those dates can be seen in the search string. If you choose different dates, you should see those reflected in the string you copy. Let's code this now:

  1. Let's input that string and save it as the variable sats, as seen in the following block of code:
sats = 'https://www.google.com/flights/f=0#f=0&flt=/m/02_286.r/m/02j9z.2018-12-01*r/m/02j9z./m/02_286.2018-12-08;c:USD;e:1;s:0*1;sd:1;t:e' 
  1. Next, we'll test that we can successfully retrieve the content that we see on the page. We'll test that with the following line of code, which utilizes selenium:
browser.get(sats) 
  1. That one line of code was all we needed to retrieve the page. We can validate that this was successful with a couple of additional lines of code.
  2. First, let's check the title of the page:
browser.title 

The resulting output can be seen as follows:

It looks like we were able to get the correct page. Let's now check to see whether we captured everything we were seeking. We can do that by taking a screenshot of the page. We do that with the following line of code:

browser.save_screenshot('/Users/alexcombs/Desktop/test_flights.png') 

Again, the path I used to save the screenshot was based on my machine; you will need to reference a path on your own machine. As you should see based on the following output, we were able to successfully get all the content of the page:

Since we appear to have all the page data we were seeking, we will now move on to how to pull individual data points from the page. To do that, first, we'll need to learn about the Document Object Model (DOM).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.184.200