Configuring Impala after installation

After Impala is installed, you must perform a few mandatory and recommended configuration settings for smooth Impala operations. Cloudera Manager does some of the configurations automatically; however, a few of them need to be completed after any kind of installation. The following is a list of post-installation configurations:

  • On Cloudera Hadoop CDH 4.2 or newer distribution, the user must enable short-circuit reads on each DataNode, after each type of installation. To enable short-circuit reads, here are the steps to follow on your Cloudera Hadoop cluster:
    1. First configure hdfs-site.xml in each DataNode as follows:
      <property>
        <name>dfs.client.read.shortcircuit</name>
        <value>true</value>
      </property>
      <property>
        <name>dfs.domain.socket.path</name>
        <value>/var/run/hadoop-hdfs/dn._PORT</value>
      </property>
      <property>
        <name>dfs.client.file-block-storage-locations.timeout</name>
        <value>3000</value>
      </property>
    2. If /var/run/Hadoop-hdfs/ is group writable, make sure its group is the root.
    3. Copy core-site.xml and hdfs-site.xml from the Hadoop configuration folder to the Impala configuration folder at /etc/impala/conf.
    4. Restart all DataNodes.
  • Cloudera Manager enables "block location tracking" and "native checksumming" for optimum performance; however, for independent installation both of these have to be enabled. Enabling block location metadata allows Impala to know on which disk data blocks are located, allowing better utilization of the underlying disks. Both "block location tracking" and "native checksumming" are described in later chapters for better understanding. Here is what you can do to enable block location tracking:
    1. hdfs-site.xml on each DataNode must have the following setting:
      <property>
        <name>dfs.datanode.hdfs-blocks-metadata.enabled</name>
        <value>true</value>
      </property>
    2. Make sure the updated hdfs-site.xml file is placed in the Impala configuration folder at /etc/impala/conf.
    3. Restart all DataNodes.
  • Enabling native checksumming causes Impala to use an optimized native library for computing checksums if that library is available. If Impala is installed using Cloudera Manager, "native checksumming" is automatically configured and no action is needed. However, if you need to enable native checksumming on your self installed Impala cluster, you must build and install the libhadoop.so Hadoop Native Library. If this library is not available, you might receive the Unable to load native-hadoop library for your platform... using built-in-java classes where applicable message in Impala logs, indicating that native checksumming is not enabled.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.212.102