Installing the CDH components

With a basic Hadoop cluster up and running, we can now install some of the important CDH components.

Installing Apache Flume

To install Apache Flume, log in as hduser and execute the following commands:

$ sudo yum install flume-ng
$ sudo yum install flume-ng-agent

You can configure Apache Flume using the configuration files present under /etc/flume-g/conf.

Installing Apache Sqoop

To install Apache Sqoop, log in as hduser and execute the following command:

$ sudo yum install sqoop

You can configure Apache Sqoop using the configuration files present under /etc/sqoop/conf.

Installing Apache Sqoop 2

Under Sqoop 2, the services are divided into two parts: sqoop2-client and sqoop2-server.

To install sqoop2-server, log in as hduser and execute the following command on one of the nodes in the Hadoop cluster:

$ sudo yum install sqoop2-server

You can configure the Apache Sqoop2 server using the configuration files present under /etc/sqoop2/conf.

To install sqoop2-client, log in as hduser and execute the following command on any server that you wish to use as a client:

$ sudo yum install sqoop2-client

Installing Apache Pig

To install Apache Pig, log in as hduser and execute the following command:

$ sudo yum install pig

You can configure Apache Pig using the configuration files present under /etc/pig/conf.

Installing Apache Hive

To install Apache Hive, log in as hduser and execute the following command:

$ sudo yum install hive

You can configure Apache Hive using the configuration files present under /etc/hive/conf.

Installing Apache Oozie

To install Apache Oozie, log in as hduser and execute the following command:

$ sudo yum install oozie

You can configure Apache Oozie using the configuration files present under /etc/oozie/conf.

Installing Apache ZooKeeper

To install Apache ZooKeeper, log in as hduser and execute the following command:

$ sudo yum install zookeeper-server

You can configure Apache Zookeeper using the configuration files present under /etc/zookeeper/conf.

With these components installed, you are now ready to use the cluster for data processing. You could use Flume to ingest streaming data from external sources to HDFS, Sqoop or Sqoop 2 to get data from external databases, Pig and Hive to write scripts and queries, and use Apache Oozie to schedule them as required.

There are several other CDH components that can be installed along with the previously mentioned components. However, we will leave the rest and see how they can be installed while going through Cloudera Manager in Chapter 5, Using Cloudera Manager.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.46.130