Apache Atlas

Atlas is a scalable and extensible set of core foundational governance services – enabling enterprises to effectively and efficiently meet their compliance requirements within Hadoop and allows integration with the whole enterprise data ecosystem.

As of writing this book, Apache Atlas is an incubating project in the Apache Software Foundation, by Hortonworks. The project is aimed at solving data governance issue in Apache Hadoop and also helps integrating well with other enterprise data applications in the organization.

High-level Atlas architecture as detailed in http://atlas.apache.org/Architecture.html is as given in the following figure:

There are few other existing commercial offerings in this space like Informatica with Big Data adaptors that can track lineage of information across the information lifecycle. Similar capabilities are being developed by various Big Data technology providers like Cloudera, Hortonworks and MapR. This capability enables effective governance around information architecture and handling.


Figure 17: High level architecture of Apache Atlas

Let's quickly run through the working (layers) of Atlas from bottom up. The Atlas core has four blocks:

  • Ingest/Export:  as the name suggest it's the component that ingests and exports the metadata. As shown in the previous figure these are stored in the metadata store
  • Type System: Atlas allows to define the model of how metadata need to be stored. It uses so called type to do that and this block allows doing exactly this functionality.
  • Graph Engine: metadata in Atlas is stored in the graph model in Atlas and this is the block that allows to do this.
  • Titan: Atlas uses Titan (http://titan.thinkaurelius.com/) as the graph database to store the metadata.

The next layer namely Integration is the layer that allows so called integration between Atlas and eternal components. The following are the two ways by that Atlas can be contacted with:

  • API: Most of the functions in Atlas is exposed as a REST API and this component allows this to happen
  • Messaging: Atlas can also be contacted or rather integrated using classic messaging and it uses Kafka as the topic to do this

Apache Atlas out of the box supports variety of sources to collect the metadata. The following are the ones supported out of box as of now:

There are applications that serve as window to Atlas. They are:

  • Admin UI: Web application using that Atlas can be administered
  • Ranger Tag based Policies: Ranger can be integrated with Atlas for security policy governance
  • Business Taxonomy: Component that allows connecting business objects with the metadata stored in Atlas
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.226.165.70