Luigi

Luigi is python module, developed at Spotify, that helps in building a complex pipeline of batch jobs. Per the Luigi Github documentation (https://github.com/spotify/luigi) , Luigi handles dependency-resolution, workflow-management, workflow-visualization, and it comes with built-in support for Hadoop.

Luigi was built to address all the plumbing that is typically associated with long-running batch processes. In any processing, the main goal is to chain many tasks together and to automate them, and build in the assumption that failures will happen. The tasks that are chained together can be anything and can be written in any language. Some examples are:

  • Long-running tasks, such as Hadoop (http://hadoop.apache.org/) jobs
  • Dumping data to/from databases
  • Running machine learning algorithms, or anything else

Features of Luigi:

  • Luigi has a built-in mechanism to parallelize workflow steps
  • It comes with a set of commonly-used task templates that speed up adoption
  • Supports python MapReduce jobs in Hadoop, Hive, and Pig
  • Includes filesystem abstractions for HDFS and local files that ensure all systems are atomic, preventing them from crashing in a state containing partial data

Conceptually, you can compare Luigi to a GNU Make utility, where there are a certain set of defined tasks and these tasks are dependent on other tasks. Luigi is similar to Oozie and Azkaban, but with one important difference: it is not built for only Hadoop-based tasks/workloads. It can easily be extended to other kinds of tasks.

One of the core concepts in Luigi is that the workflow needs to be defined as a python file. Thus, the dependency graph is specified within python. Now this may or may not be a dampener depending upon whether your organization is python—heavy or not. In any case, using Luigi may not be a difficult thing even if you have only basic Python knowledge.

The main thing to understand is that Luigi can trigger things that are not written in python.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.170.174