Unordered list

But how can we collect all the information from the nested list simultaneously? This seems like a proper task for recursion – for each element of the list, we will store the link and the date (where this exists), and, if it has some nested elements, we'll run the same function on them as well. To keep track of the nesting levels, we also add a level property. Consider the following example. It may seem overly complex at first, but all those try/else and if statements keep the code working if some values are missing—the function will still work if there is no date, or link, or nested elements. We're also using next as we don't want to get all the text elements (and waste time and memory on them); we only need the first two:

def dictify(ul, level=0):
result = dict()

for li in ul.find_all("li", recursive=False):
text = li.stripped_strings
key = next(text)
try:
time = next(text).replace(':', '').strip()
except StopIteration:
time = None
ul, link = li.find("ul"), li.find('a')
if link:
link = _abs_link(link.get('href'))
r ={'url': link,
'time':time,
'level': level}
if ul:
r['children'] = dictify(ul, level=(level + 1))
result[key] = r
return result

This function is not very elegant, but it does its job. Now, let's try running it on all the fronts:

theaters = {}

for front in fronts:
list_element = front.find_next_siblings("div", "div-col columns column-width")[0].ul
theaters[front.text[:-6]] = dictify(list_element)

If you want, you can print the resulting data out – it works! Now, we have all the links – all that is left is to scrape them. But first, let's store what we have obtained so far as a JSON file. For that, we need to open the file as we did with the CSVs, in 'w' (write) mode. Once that is done, we can use the json package to dump the dictionary into the file. Take a look at the following snippet:

import json

with open('all_battles.json', 'w') as f:
json.dump(theaters, f)

That was easy! Now, let's parse the information from a specific battle's page.

In this chapter, and generally throughout the book, we are trying to keep basic functions simple and universal – so that they can be used repeatedly without any change. All specific decisions are made on a higher level. This is not only helping to reuse the code; it also makes it more transparent and predictable. You won't need to remember all the decisions you made on the lower level.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.112.82