Building our first task in Luigi

Luckily, luigi allows us to start small. We'll start by building a task that pulls all of the links on the battles, using the code from our wikiwwii package. First, we will import all we need in a separate file, luigi_fronts.py:

# luigi_fronts.py
from pathlib import Path
import json

import luigi
from wikiwwii.collect.battles import collect_fronts
URL = 'https://en.wikipedia.org/wiki/List_of_World_War_II_battles'
folder = Path(__file__).parents[1] / 'data'

Here, we declared a link for the battles, imported our collect_fronts function, and specified a relative folder to store the data in. Now, let's write the task itself. In the following, we'll create a task class, define the URL as a luigi parameter with a default value (more on that later), and add (or, rather, override) two methods—output, which returns a local target with the path data, and run, which describes the actual code to run—it collects the data and writes it to the file, defined in output. Indeed, the actual business logic here is in one line—thanks to the wikiwwii package we made in the previous chapter:

class ScrapeFronts(luigi.Task):
url = luigi.Parameter(default=URL, description='page url')

def output(self):
name = self.link.split('/')[-1]
path = str(folder / f'{name}.json')
return luigi.LocalTarget(path)

def run(self):
data = collect_fronts(self.url)
with open(self,output().path, 'w') as f:
json.dump(data, f)

Let's run it! Luigi provides a convenient command-line interface for that:

$ python -m luigi --module luigi_fronts ScrapeFronts --local-scheduler

Here, we specify the file to use luigi_fronts and a specific task, ScrapeFronts. As a result, you should get the following summary:

===== Luigi Execution Summary =====

Scheduled 1 tasks of which:
* 1 ran successfully:
- 1 ScrapeFronts(url=https://en.wikipedia.org/wiki/List_of_World_War_II_battles)

This progress looks :) because there were no failed tasks or missing dependencies

===== Luigi Execution Summary =====

The most important indicator here is the smiley face—which means that everything ran smoothly—and, as you can check, the data file was created. For the sake of experiment, try running it one more time. It will result in the following:

===== Luigi Execution Summary =====

Scheduled 1 tasks of which:
* 1 complete ones were encountered:
- 1 ScrapeFronts(url=https://en.wikipedia.org/wiki/List_of_World_War_II_battles)

Did not run any tasks
This progress looks :) because there were no failed tasks or missing dependencies

===== Luigi Execution Summary =====

This means I checked and the result file is already there. If we need to re-run the task anyway, we'll have to delete or rename the file.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.59.6.153