Monitoring cluster health using Ganglia

Ganglia is a monitoring system designed for use with clusters and grids. Hadoop can be configured to send periodic metrics to the Ganglia monitoring daemon, which is useful for diagnosing and monitoring the health of the Hadoop cluster. This recipe will explain how to configure Hadoop to send metrics to the Ganglia monitoring daemon.

Getting ready

Ensure that you have Ganglia Version 3.1 or better installed on all of the nodes in the Hadoop cluster. The Ganglia monitoring daemon (gmond) should be running on every worker node in the cluster. You will also need the Ganglia meta daemon (gmetad) running on at least one node, and another node running the Ganglia web frontend.

The following is an example with modified gmond.conf file that can be used by the gmond daemon:

cluster {
  name = "Hadoop Cluster"
  owner = "unspecified"
  latlong = "unspecified"
  url = "unspecified"
}

host {
  location = "my datacenter"
}

udp_send_channel {
  host = mynode.company.com
  port = 8649
  ttl = 1
}

udp_recv_channel {
  port = 8649
}

tcp_accept_channel {
  port = 8649
}

Also, ensure that the Ganglia meta daemon configuration file includes your cluster as a data source. For example, modify the gmeta.conf configuration file to add the Hadoop cluster as a data source:

data_source "Hadoop Cluster" mynode1.company.com:8649 mynode2.company.com:8649 mynode3.company.com:8649

How to do it...

Perform the following steps to use Ganglia to monitor cluster metrics:

  1. Edit the hadoop-metrics.properties file found in the Hadoop configuration folder. If the hadoop-metrics.properties file does not exist, create it:

    This property file will need to be updated for every node in the cluster.

    $ vi /path/to/hadoop/hadoop-metrics.properties
    dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
    dfs.period=10
    dfs.servers=mynode1.company.com:8649
    
    mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
    mapred.period=10
    mapred.servers=mynode1.company.com 8649
    
    jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
    jvm.period=10
    jvm.servers=mynode1.company.com:8649
    
    rpc.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
    rpc.period=10
    rpc.servers=mynode1.company.com 8649
  2. Restart the Ganglia meta daemon service.
  3. Restart the Hadoop cluster:
    $ cd /path/to/hadoop
    $ bin/stop-all.sh
    $ bin/start-all.sh
  4. Verify that Ganglia is collecting Hadoop metrics via the Ganglia web frontend.

How it works...

The Ganglia monitoring daemon (gmond) is responsible for collecting metric information from the nodes where it is installed. Next, all of the metrics collected by the gmond daemons are aggregated to the Ganglia meta daemon (gmetad). Finally, the Ganglia web frontend will request the aggregated metrics in the form of XML from the gmetad daemon and report that to users via the web interface.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.189.228