Omega is Google's next generation cluster management system.
Omega is specifically focused on a cluster scheduling architecture that uses parallelism, shared state, and optimistic concurrency control.
From the past experience, Google noticed that as the clusters and their workloads increase, the scheduler is at risk of becoming a scalability bottleneck.
Google's production job scheduler has experienced all of this. Over the years, it has evolved into a complicated, sophisticated system that is hard to change.
A schematic overview of the scheduling architectures can be seen in the following figure:
Google identified the following two prevalent scheduler architectures shown in the preceding figure:
The solution is Omega—a new parallel scheduler architecture built around the shared state, using lock-free optimistic concurrency control, to achieve both implementation extensibility and performance scalability.
Omega's approach reflects a greater focus on scalability, but makes it harder to enforce global properties, such as capacity, fairness, and deadlines.
For more information, refer to http://research.google.com/pubs/pub41684.html.
3.147.54.6