Data preparation

We will be using the same set of data as used before, that is, 2 million customer records, addresses, and contacts.

But before we proceed, let's clean the data created in previous chapters by following the steps explained here. Ensure the required processes are up and running for the cleanup, i.e. Hue, DFS, hiveserver2, Zookeeper and Kafka.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.