Chapter 7. Managing an Apache Hadoop Cluster

We are now equipped with the skills to install and bring up a secure Apache Hadoop cluster running CDH5 and Cloudera Manager. In this chapter, we will learn the different techniques to manage the cluster by covering the following topics:

  • Configuring Hadoop services using Cloudera Manager
  • Role management in Cloudera Manager
  • Managing hosts using Cloudera Manager
  • Managing multiple clusters with Cloudera Manager
  • Rebalancing an HDFS cluster from Cloudera Manager

Configuring Hadoop services using Cloudera Manager

Cloudera Manager is a very intuitive tool that provides a user-friendly interface to add, remove, and configure services in a cluster. In this section, we will cover the addition and removal of services in a cluster.

Adding a service to the cluster

The following are the steps to add a service to the cluster:

  1. Log in to Cloudera Manager. The Home screen lists all the services that are currently installed on the cluster as shown in the following screenshot:
    Adding a service to the cluster
  2. In the preceding screenshot, there are only two services in Cluster 1. Let's now add the Hive service to this cluster. To add Hive, click on the drop-down button for Cluster 1 and select the Add a Service option as shown in the following screenshot:
    Adding a service to the cluster

    In the next screen, a list of service types along with their description is displayed, as shown in the following screenshot:

    Adding a service to the cluster
  3. To add the Hive service (the data warehouse system for Hadoop), select Hive from the list and click on Continue.
  4. Next, select the dependencies for the Hive service as shown in the following screenshot; in this case, select hdfs and click on Continue:
    Adding a service to the cluster
  5. On the next screen, you are given the option to select hosts for the Hive Metastore and the Hive Gateway. The Hive Metastore is responsible for storing the metadata of the Hive schemas, tables, and partitions. The Hive Gateway is where you can host the Hive shell client. Cloudera Manager, by default, selects a host automatically.

    The role assignment screen is shown in the following screenshot. Click on Continue.

    Adding a service to the cluster
  6. The next screen, as shown in the following screenshot, provides the options to select the database for the Hive Metastore:
    Adding a service to the cluster

    There are two types of databases you can create for the Hive Metastore:

    • Embedded: On selecting Use Embedded Database, a PostgreSQL database will be automatically created and managed by Cloudera Manager to store the Hive Metastore
    • Custom: On selecting Use Custom Database, all the database details have to be provided by the administrator

    For this demonstration, let's select Use Embedded Database. Copy down the generated password for future reference and click on the Test Connection button to test the database connection. After the testing operation completes, click on Continue.

  7. The next screen, as shown in the following screenshot, displays the default configuration changes for review. Click on Continue to proceed.
    Adding a service to the cluster
  8. In the next step, we perform all the required actions to set up the Hive service on the cluster. The following screenshot shows the different actions Cloudera Manager performs to set up Hive. Once the steps are complete, click on Continue.
    Adding a service to the cluster
  9. Once the service is successfully set up, you should see a message as shown in the following screenshot. Click on the Finish button to complete the setup.
    Adding a service to the cluster
  10. You should now see the newly configured Hive service on the Home page as shown in the following screenshot:
    Adding a service to the cluster

Using the preceding steps, we have successfully set up Hive on the cluster. The steps are almost identical to add any other service to the cluster.

Removing a service from the cluster

Removing a service from a cluster is a very easy operation using Cloudera Manager. The following are the steps to remove a service from the cluster:

  1. Navigate to the Cloudera Manager's Home screen. For this demonstration, let's remove the Hive service from the cluster.
  2. Click on the drop-down button for the Hive service as shown in the following screenshot and select Stop to stop the Hive service:
    Removing a service from the cluster
  3. Once the service has stopped, click on the drop-down button for the Hive service and select Delete to delete the service as shown in the following screenshot:
    Removing a service from the cluster
  4. A pop-up message to confirm the deletion of service is displayed as shown in the following screenshot. Click on Delete to confirm the action.
    Removing a service from the cluster
  5. Once confirmed, the service is deleted from the cluster and the service will not be visible in the list of services.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.206.25