Omega

Omega is Google's next generation cluster management system.

Omega is specifically focused on a cluster scheduling architecture that uses parallelism, shared state, and optimistic concurrency control.

From the past experience, Google noticed that as the clusters and their workloads increase, the scheduler is at risk of becoming a scalability bottleneck.

Google's production job scheduler has experienced all of this. Over the years, it has evolved into a complicated, sophisticated system that is hard to change.

A schematic overview of the scheduling architectures can be seen in the following figure:

Omega
  • contrib project to Hadoop 0.20 branch and is not a very large code base.
  • Corona is integrated with the fair-scheduler.
  • YARN is more interested in the capacity scheduler.

Google identified the following two prevalent scheduler architectures shown in the preceding figure:

  • Monolithic schedulers: This uses a single, centralized scheduling algorithm for all jobs (our existing scheduler is one of these). They do not make it easy to add new policies and specialized implementations, and may not scale up to the cluster sizes one is planning for in the future.
  • Two-level schedulers: This will have a single active resource manager that offers compute resources to multiple parallel, independent scheduler frameworks, as in Mesos and Hadoop On Demand (HOD). Their architectures do appear to provide flexibility and parallelism, but in practice their conservative resource visibility and locking algorithms limit both, and make it hard to place difficult to-schedule "picky" jobs or to make decisions that require access to the state of the entire cluster.

The solution is Omega—a new parallel scheduler architecture built around the shared state, using lock-free optimistic concurrency control, to achieve both implementation extensibility and performance scalability.

Omega's approach reflects a greater focus on scalability, but makes it harder to enforce global properties, such as capacity, fairness, and deadlines.

For more information, refer to http://research.google.com/pubs/pub41684.html.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.54.6