With a basic Hadoop cluster up and running, we can now install some of the important CDH components.
To install Apache Flume, log in as hduser
and execute the following commands:
$ sudo yum install flume-ng $ sudo yum install flume-ng-agent
You can configure Apache Flume using the configuration files present under /etc/flume-g/conf
.
To install Apache Sqoop, log in as hduser
and execute the following command:
$ sudo yum install sqoop
You can configure Apache Sqoop using the configuration files present under /etc/sqoop/conf
.
Under Sqoop 2, the services are divided into two parts: sqoop2-client
and sqoop2-server
.
To install sqoop2-server
, log in as hduser
and execute the following command on one of the nodes in the Hadoop cluster:
$ sudo yum install sqoop2-server
You can configure the Apache Sqoop2 server using the configuration files present under /etc/sqoop2/conf
.
To install sqoop2-client
, log in as hduser
and execute the following command on any server that you wish to use as a client:
$ sudo yum install sqoop2-client
To install Apache Pig, log in as hduser
and execute the following command:
$ sudo yum install pig
You can configure Apache Pig using the configuration files present under /etc/pig/conf
.
To install Apache Hive, log in as hduser
and execute the following command:
$ sudo yum install hive
You can configure Apache Hive using the configuration files present under /etc/hive/conf
.
To install Apache Oozie, log in as hduser
and execute the following command:
$ sudo yum install oozie
You can configure Apache Oozie using the configuration files present under /etc/oozie/conf
.
To install Apache ZooKeeper, log in as hduser
and execute the following command:
$ sudo yum install zookeeper-server
You can configure Apache Zookeeper using the configuration files present under /etc/zookeeper/conf
.
With these components installed, you are now ready to use the cluster for data processing. You could use Flume to ingest streaming data from external sources to HDFS, Sqoop or Sqoop 2 to get data from external databases, Pig and Hive to write scripts and queries, and use Apache Oozie to schedule them as required.
There are several other CDH components that can be installed along with the previously mentioned components. However, we will leave the rest and see how they can be installed while going through Cloudera Manager in Chapter 5, Using Cloudera Manager.
3.141.46.130