Creating docker images

Docker images are ready-to-use, pre-configured systems. The docker run command can be used to access and install the docker images available on the DockerHub (https://hub.docker.com/), a web service where package maintainers upload ready-to-use images to test and deploy various applications.

One way to create a docker image is by using the docker commit command on an existing container. The docker commit command takes a container reference and the output image names as arguments:

$ docker commit <container_id> <new_image_name>

Using this method is useful to save snapshots of a certain container but, if the image is removed from the system, the steps to recreate the image are lost as well.

A better way to create an image is to build it using a Dockerfile. A Dockerfile is a text file that provides instructions on how to build an image starting from another image. As an example, we will illustrate the contents of the Dockerfile we used in the last chapter to set up PySpark with Jupyter notebook support. The complete file is reported here.

Each Dockerfile needs a starting image, which can be declared with the FROM command. In our case, the starting image is jupyter/scipy-notebook, which is available through DockerHub (https://hub.docker.com/r/jupyter/scipy-notebook/).

Once we have defined our starting image, we can start issuing shell commands to install packages and perform other configurations using a series of RUN and ENV commands. In the following example, you can recognize installation of Java Runtime Environment (openjdk-7-jre-headless) as well as downloading Spark and setting up relevant environment variables. The USER instructions can be used to specify which user executes the subsequent commands:

    FROM jupyter/scipy-notebook
    MAINTAINER Jupyter Project <[email protected]>
    USER root

    # Spark dependencies
    ENV APACHE_SPARK_VERSION 2.0.2
    RUN apt-get -y update && 
        apt-get install -y --no-install-recommends 
        openjdk-7-jre-headless && 
        apt-get clean && 
        rm -rf /var/lib/apt/lists/*
    RUN cd /tmp && 
        wget -q http://d3kbcqa49mib13.cloudfront.net/spark-
        ${APACHE_SPARK_VERSION}-bin-hadoop2.6.tgz    && 
        echo "ca39ac3edd216a4d568b316c3af00199
              b77a52d05ecf4f9698da2bae37be998a 
              *spark-${APACHE_SPARK_VERSION}-bin-hadoop2.6.tgz" | 
        sha256sum -c - && 
        tar xzf spark-${APACHE_SPARK_VERSION}
        -bin-hadoop2.6.tgz -C /usr/local && 
        rm spark-${APACHE_SPARK_VERSION}-bin-hadoop2.6.tgz
    RUN cd /usr/local && ln -s spark-${APACHE_SPARK_VERSION}
        -bin-hadoop2.6 spark

    # Spark and Mesos config
    ENV SPARK_HOME /usr/local/spark
    ENV PYTHONPATH $SPARK_HOME/python:$SPARK_HOME/python/lib/
        py4j-0.10.3-src.zip
    ENV SPARK_OPTS --driver-java-options=-Xms1024M 
        --driver-java-options=-
        Xmx4096M --driver-java-options=-Dlog4j.logLevel=info

    USER $NB_USER

Dockerfiles can be used to create images using the following command from the directory where the Dockerfile is located. The -t option can be used to specify the tag that will be used to store the image. With the following line, we can create the image named pyspark from the preceding Dockerfile:

$ docker build -t pyspark .

The command will automatically retrieve the starting image, jupyter/scipy-notebook, and produce a new image, named pyspark.

Table of Contents for Creating docker images

Create new playlist

Sign In

Sign Up

Table of Contents for
Creating docker images