We are now equipped with the skills to install and bring up a secure Apache Hadoop cluster running CDH5 and Cloudera Manager. In this chapter, we will learn the different techniques to manage the cluster by covering the following topics:
Configuring Hadoop services using Cloudera Manager
Role management in Cloudera Manager
Managing hosts using Cloudera Manager
Managing multiple clusters with Cloudera Manager
Rebalancing an HDFS cluster from Cloudera Manager
Configuring Hadoop services using Cloudera Manager
Cloudera Manager is a very intuitive tool that provides a user-friendly interface to add, remove, and configure services in a cluster. In this section, we will cover the addition and removal of services in a cluster.
Adding a service to the cluster
The following are the steps to add a service to the cluster:
Log in to Cloudera Manager. The Home screen lists all the services that are currently installed on the cluster as shown in the following screenshot:
In the preceding screenshot, there are only two services in Cluster 1. Let's now add the Hive service to this cluster. To add Hive, click on the drop-down button for Cluster 1 and select the Add a Service option as shown in the following screenshot:
In the next screen, a list of service types along with their description is displayed, as shown in the following screenshot:
To add the Hive service (the data warehouse system for Hadoop), select Hive from the list and click on Continue.
Next, select the dependencies for the Hive service as shown in the following screenshot; in this case, select hdfs and click on Continue:
On the next screen, you are given the option to select hosts for the Hive Metastore and the Hive Gateway. The Hive Metastore is responsible for storing the metadata of the Hive schemas, tables, and partitions. The Hive Gateway is where you can host the Hive shell client. Cloudera Manager, by default, selects a host automatically.
The role assignment screen is shown in the following screenshot. Click on Continue.
The next screen, as shown in the following screenshot, provides the options to select the database for the Hive Metastore:
There are two types of databases you can create for the Hive Metastore:
Embedded: On selecting Use Embedded Database, a PostgreSQL database will be automatically created and managed by Cloudera Manager to store the Hive Metastore
Custom: On selecting Use Custom Database, all the database details have to be provided by the administrator
For this demonstration, let's select Use Embedded Database. Copy down the generated password for future reference and click on the Test Connection button to test the database connection. After the testing operation completes, click on Continue.
The next screen, as shown in the following screenshot, displays the default configuration changes for review. Click on Continue to proceed.
In the next step, we perform all the required actions to set up the Hive service on the cluster. The following screenshot shows the different actions Cloudera Manager performs to set up Hive. Once the steps are complete, click on Continue.
Once the service is successfully set up, you should see a message as shown in the following screenshot. Click on the Finish button to complete the setup.
You should now see the newly configured Hive service on the Home page as shown in the following screenshot:
Using the preceding steps, we have successfully set up Hive on the cluster. The steps are almost identical to add any other service to the cluster.
Removing a service from the cluster
Removing a service from a cluster is a very easy operation using Cloudera Manager. The following are the steps to remove a service from the cluster:
Navigate to the Cloudera Manager's Home screen. For this demonstration, let's remove the Hive service from the cluster.
Click on the drop-down button for the Hive service as shown in the following screenshot and select Stop to stop the Hive service:
Once the service has stopped, click on the drop-down button for the Hive service and select Delete to delete the service as shown in the following screenshot:
A pop-up message to confirm the deletion of service is displayed as shown in the following screenshot. Click on Delete to confirm the action.
Once confirmed, the service is deleted from the cluster and the service will not be visible in the list of services.