BMW

The BMW website has a search tool to find local dealerships, available at https://www.bmw.de/de/home.html?entryType=dlo:

BMW

This tool takes a location, and then displays the points near it on a map, such as this search for Berlin:

BMW

Using Firebug, we find that the search triggers this AJAX request:

https://c2b-services.bmw.com/c2b-localsearch/services/api/v3/
    clients/BMWDIGITAL_DLO/DE/
        pois?country=DE&category=BM&maxResults=99&language=en&
            lat=52.507537768880056&lng=13.425269635701511

Here, the maxResults parameter is set to 99. However, we can increase this to download all locations in a single query, a technique covered in Chapter 1, Introduction to Web Scraping. Here is the result when maxResults is increased to 1000:

>>> url = 'https://c2b-services.bmw.com/c2b-localsearch/services/api/v3/clients/BMWDIGITAL_DLO/DE/pois?country=DE&category=BM&maxResults=%d&language=en&lat=52.507537768880056&lng=13.425269635701511'
>>> jsonp = D(url % 1000)
>>> jsonp
'callback({"status":{
...
})'

This AJAX request provides the data in JSONP format, which stands for JSON with padding. The padding is usually a function to call, with the pure JSON data as an argument, in this case the callback function call. To parse this data with Python's json module, we need to first strip this padding:

>>> import json
>>> pure_json = jsonp[jsonp.index('(') + 1 : jsonp.rindex(')')]
>>> dealers = json.loads(pure_json)
>>> dealers.keys()
[u'status', u'count', u'translation', u'data', u'metadata']
>>> dealers['count']
731

We now have all the German BMW dealers loaded in a JSON object—currently, 731 of them. Here is the data for the first dealer:

>>> dealers['data']['pois'][0]
{u'attributes': {u'businessTypeCodes': [u'NO', u'PR'],
 u'distributionBranches': [u'T', u'F', u'G'],
 u'distributionCode': u'NL',
 u'distributionPartnerId': u'00081',
 u'fax': u'+49 (30) 20099-2110',
 u'homepage': u'http://bmw-partner.bmw.de/niederlassung-berlin-weissensee',
 u'mail': u'[email protected]',
 u'outletId': u'3',
 u'outletTypes': [u'FU'],
 u'phone': u'+49 (30) 20099-0',
 u'requestServices': [u'RFO', u'RID', u'TDA'],
 u'services': []},
 u'category': u'BMW',
 u'city': u'Berlin',
 u'country': u'Germany',
 u'countryCode': u'DE',
 u'dist': 6.65291036632401,
 u'key': u'00081_3',
 u'lat': 52.562568863415,
 u'lng': 13.463589476607,
 u'name': u'BMW AG Niederlassung Berlin Filiale Weixdfensee',
 u'postalCode': u'13088',
 u'street': u'Gehringstr. 20'}

We can now save the data of interest. Here is a snippet to write the name and latitude and longitude of these dealers to a spreadsheet:

with open('bmw.csv', 'w') as fp:
    writer = csv.writer(fp)
    writer.writerow(['Name', 'Latitude', 'Longitude'])
    for dealer in dealers['data']['pois']:
        name = dealer['name'].encode('utf-8')
        lat, lng = dealer['lat'], dealer['lng']
        writer.writerow([name, lat, lng])

After running this example, the contents of the bmw.csv spreadsheet will look similar to this:

Name,Latitude,Longitude
BMW AG Niederlassung Berlin Filiale Weißensee,52.562568863415,13.463589476607
Autohaus Graubaum GmbH,52.4528925,13.521265
Autohaus Reier GmbH & Co. KG,52.56473,13.32521
...

The full source code for scraping this data from BMW is available at https://bitbucket.org/wswp/code/src/tip/chapter09/bmw.py.

Note

Translating foreign content

You may have noticed that the first screenshot for BMW was in German, but the second in English. This is because the text for the second was translated using the Google Translate browser extension. This is a useful technique when trying to understand how to navigate a website in a foreign language. When the BMW website is translated, the website still works as usual. Be aware, though, as Google Translate will break some websites, for example, if the content of a select box is translated and a form depends on the original value.

Google Translate is available as the Google Translate extension for Chrome, the Google Translator addon for Firefox, and can be installed as the Google Toolbar for Internet Explorer. Alternatively, http://translate.google.com can be used for translations—however, this often breaks functionality because Google is hosting the content.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.22.249.90