Apache Oozie

Apache Oozie is an open source Java-based web application used for pipeline creation, and it is well integrated with the Hadoop stack.

Oozie can be used to schedule and run Oozie jobs in a Hadoop cluster. It can combine small jobs into more complex ones and can do this according to the pipeline configured to achieve the required use case. Oozie triggers the configured workflow and leverages the Hadoop engine to execute the individual jobs in the workflow.

Job completion of Oozie tasks is detected by two mechanisms, namely, callback and polling. When a job is configured, a callback URL can be configured, which is invoked when the job is completed.

This figure shows the basic working of Oozie:

Figure 16: Basic working of Oozie

The Oozie client invokes the server that stores the workflow definitions and job execution details in a database along with the execution details of a triggered Oozie task. The database also holds the status and URL callbacks for all the jobs in the workflow. The Oozie server then uses the Hadoop engine for actual execution of the jobs and receives callback triggers when the jobs are completed and when the whole workflow is completed.

For more details, refer to Apache Oozie documentation at http://oozie.apache.org/.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.151.44