Summary

Chapter 8, Spark Databricks and Chapter 9, Databricks Visualization, have provided an introduction to Databricks in terms of cloud installation, and the use of Notebooks and folders. Account and cluster management have been examined. Also, job creation, the idea of remote library creation, and importing have been examined. The functionality of the Databricks dbutils package, and the Databricks file system was explained in Chapter 8, Spark Databricks. Tables, and an example of data import was also shown so that SQL can be run against a dataset.

The idea of data visualization has been examined, and a variety of graphs have also been created. Dashboards have been made to show how easy it is to both, create, and share this kind of data presentation. The Databricks REST interface has been shown via worked examples, as an aid to using a Databricks cloud instance remotely, and integrating it with external systems. Finally, the data and library movement options have been examined in terms of workspace, folders, and tables.

You might ask why I have committed two chapters to a cloud-based service such as Databricks. The reason is that Databricks seems to be a logical, cloud-based progression, from Apache Spark. It is supported by the people who originally developed Apache Spark and although in it's infancy as a service and subject to change still capable of providing a Spark cloud based production service. This means that a company wishing to use a Spark could use Databricks and grow their cloud as demand grows and have access to dynamic Spark-based machine learning, graph processing, SQL, streaming and visualization functionality.

As ever, these Databricks chapters have just scratched the surface of the functionality available. The next step will be to create an AWS and Databricks account yourself, and use the information provided here to gain practical experience.

As this is the last chapter, I will provide my contact details again. I would be interested in the ways that people are using Apache Spark. I would be interested in the size of clusters you are creating, and the data that you are processing. Are you using Spark as a processing engine? Or are you building systems on top of it? You can connect with me at LinkedIn at: linkedin.com/profile/view?id=73219349.

You can contact me via my website at semtech-solutions.co.nz or finally, by email at: .

Finally, I maintain a list of open-source-software-related presentations when I have the time. Anyone is free to use, and download them. They are available on SlideShare at: http://www.slideshare.net/mikejf12/presentations.

If you have any challenging opportunities or problems, please feel free to contact me using the previous details.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.148.104.124