Time for action – installing and configuring Flume

Let's get Flume downloaded and installed.

  1. Retrieve the most recent Flume NG binary from http://flume.apache.org/ and download and save it to the local filesystem.
  2. Move the file to the desired location and uncompress it:
    $ mv apache-flume-1.2.0-bin.tar.gz /opt
    $ tar -xzf /opt/apache-flume-1.2.0-bin.tar.gz
    
  3. Create a symlink to the installation:
    $ ln -s /opt/apache-flume-1.2.0 /opt/flume
    
  4. Define the FLUME_HOME environment variable:
    Export FLUME_HOME=/opt/flume
    
  5. Add the Flume bin directory to your path:
    Export PATH=${FLUME_HOME}/bin:${PATH}
    
  6. Verify that JAVA_HOME is set:
    Echo ${JAVA_HOME}
    
  7. Verify that the Hadoop libraries are in the classpath:
    $ echo ${CLASSPATH}
    
  8. Create the directory that will act as the Flume conf directory:
    $ mkdir /home/hadoop/flume/conf
    
  9. Copy the needed files into the conf directory:
    $ cp /opt/flume/conf/log4j.properties /home/hadoop/flume/conf
    $ cp /opt/flume/conf/flume-env.sh.sample /home/hadoop/flume/conf/flume-env.sh
    
  10. Edit flume-env.sh and set JAVA_HOME.

What just happened?

The Flume installation is straightforward and has similar prerequisites to previous tools we have installed.

Firstly, we retrieved the latest version of Flume NG (any version of 1.2.x or later will do) and saved it to the local filesystem. We moved it to the desired location, uncompressed it, and created a convenience symlink to the location.

We needed to define the FLUME_HOME environment variable and add the bin directory within the installation directory to our classpath. As before, this can be done directly on the command line or within convenience scripts.

Flume requires JAVA_HOME to be defined and we confirmed this is the case. It also requires Hadoop libraries, so we checked that the Hadoop classes are in the classpath.

The last steps are not strictly necessary for demonstration though will be used in production. Flume looks for a configuration directory within which are files defining the default logging properties and environment setup variables (such as JAVA_HOME). We find Flume performs most predictably when this directory is properly set up, so we did this now and don't need to change it much later.

We assumed /home/hadoop/flume is the working directory within which the Flume configuration and other files will be stored; change this based on what's appropriate for your system.

Using Flume to capture network data

Now that we have Flume installed, let's use it to capture some network data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.77.208