Chapter 9. Programming a Parallel Cluster

Programming a Parallel Cluster

At its most basic level, a parallel cluster is designed to make use of several processors at once and pass code between them. The combination of nodes, networks, and the monitoring software that runs in integration with each other is only one aspect of running a parallel cluster.

Remember that a parallel cluster is designed to handle a larger data set of code than a single processor system. Where one CPU can handle a small amount of code, many processors can handle that much more processing. However, in actual practice, the code has to be optimized to handle the extra processors. Although Linux can easily handle systems with extra processors, it’s not designed to handle clusters right out of the gate, and similarly, there isn’t much code written to handle parallel environments because each environment is different. Each environment has different needs.

Putting several nodes together and making sure that they can communicate is only the first step. These nodes are essentially useless without giving them a means to communicate with each other, and that’s where message-passing capable languages come into play. Libraries such as Message Passing Interface (MPI) and Private Virtual Machine (PVM) can build upon C and Fortran, which enable code to be passed onto different nodes of a cluster. These libraries were built out of the need for standardization across platforms. Before these standards were put in place, each vendor had its own implementation of parallel programming code. Each environment had to be programmed differently. With the growth of Linux and the maturity of these libraries, parallel computing has taken off as a viable alternative for supercomputer performance.

This chapter doesn’t intend to be an exhaustive resource of MPI programming applications. However, by using the material in this chapter, you will be able to install and use MPI in clustered environments that are suitable for problem solving.

Coarse Granularity in a Finely Granular World

The application to be run against the cluster defines the cluster itself, and therefore must be taken into account when designing the cluster and the environment in which it’s run under. If you’re simply designing an environment in which to run generic and varied applications against, a generic cluster full of single CPUs does nicely. It might be preferable to spend the money on cheaper systems rather than budget the extra money on dedicated symmetric multiprocessor (SMP) machines. The truth is that different types of problems work better with different hardware configurations. This is determined by what’s known as granularity.

Granularity is measured by fine, medium, and coarse grades that are similar to sandpaper. Granularity is determined by the amount of possible parallelism in the code, rather than the complete code itself. You must consider three factors when determining the granularity: the structure of the problem, the size of the problem, and the number of CPUs available to handle the problem.

Parallel programs are composed of tasks that are distributed across each processor. The larger the task size, the coarser the granularity. If you assign your cluster the simple task of having each processor add 10 numbers to each other, the task isn’t that difficult, and is said to have coarse granularity. Problems arise when the code starts becoming more complex and starts to interact with other nodes. The more interaction, the finer the code.

If there’s much interaction between CPUs, you want them to be closer in general. Remember that it’s much quicker to communicate over a fast bus speed of a SMP system than a multitude of single processors that are connected over a network. The more finely granulated the code, the faster you want the nodes to talk to each other. It’s for this reason that you want to think about a SMP system with more processors. A code base that doesn’t see the nodes talking to each other much is said to be very coarse, and suffices just fine with multiple nodes, rather than with multiple CPUs.

For this reason, you also want to tailor your code to the pre-existing environment. In a best-case scenario, you’re given a task and are given the opportunity to design your cluster around the problem. In a worst-case scenario, you’re given a cluster and have to modify your code around the hardware. Either way, you must write code to fit the environment. This is why pre-written software doesn’t fit well into the parallel cluster environment. It’s hard to optimize code for every possible scenario.

Programming in a Clustered Environment

When considering writing programs for parallel and high-performance clusters, each particular program has to be evaluated for the type of cluster and problem to be solved.

You might be asking, “Why can’t I just install a package with the program I want to run across my cluster?” It’s not totally unheard of, but the truth is that the problems that people want to solve typically haven’t been solved already. The companies or universities that use parallel clusters use them for research such as weather prediction, genome and pharmaceutical research, even nuclear predictions. It’s unlikely that someone has already written a program for a computer, such as ASCI White, to be ported to an elementary school cluster. Each application is best served by knowing the cluster itself. By writing these applications from scratch, you can make the best usage out of your own cluster. You can know where the bottlenecks lie and effectively work around them. By knowing where your Network File System (NFS) server lies, you can write your application to take full advantage of it.

Programming for parallel systems falls into two categories, shared and distributed-memory, depending on the type of architecture you’re talking about. SMP clusters, however, have elements of both in a mixed configuration. Message passing is the preferred method for these clusters because of the standardization and ease of the programming environment. Message passing is achieved with inter-process communications by sending and receiving messages through the processors. These messages can, like TCP/IP packets, be received in any order. Each processor, even those in SMP systems, talk to each other through means of passing messages. These message-passing programs consist of cooperating processes, each with its own memory.

MPI seeks to standardize message passing by introducing a layer of extensions to Fortran 77 and C. This allows for people to write portable message programs across parallel machines. MPI is a library for writing programs instead of writing in a virtual distributed operating system such as the environment prepared by Scyld Linux or Mosix. Many different implementations of MPI exist, including LAM/MPI and MPICH, with MPICH being the most widely used implementation. MPI includes two different phases, with phase one implementing simple message passing, and the second phase including remote memory, parallel input/output (I/O) and dynamic processes.

Programming using environments such as PVM also incorporate libraries across a virtual machine. It includes functions for C and C++, and subroutines for Fortran. PVM includes a framework for a homogenous framework to run across heterogeneous computers that are running concurrent or parallel jobs. PVM subsystems also rely on message passing routines between CPUs to communicate.

Although MPI and PVM are similar in that they both provide libraries for parallel computing, they’re not interchangeable. PVM was, until the standardization of MPI, the de facto means for message passing; however, it’s largely been superseded by MPI. This isn’t to say that PVM isn’t still in use, but that you might not choose PVM over MPI when programming a parallel cluster.

MPI

The goal of MPI is to develop a standard for message passing across heterogeneous platforms. The goal when developing the interface was to take a best practice approach to message passing. It’s designed to take the best of all the message-passing libraries and merge them into its own structure for parallel programming and applications.

You can get MPI for just about any platform that runs parallel code. There’s also as many different implementations of MPI as there are distributions of Linux. There’s MPICH, a winmpich for Windows NT, and an Illinois High Performance Virtual Machine that’s based on MPICH. Local-area mobility-MPI (LAM-MPI) is also popular. There’s an MPI for the IBM SP, and the OS/390, MPI for SGI and Digital machines. There’s even rumor of MPI for the Palm Pilot.

Getting and Installing MPI

You can download the MPICH implementation of MPI from http://wwwunix.mcs.anl.gov/mpi/index.html. MPICH is fully compliant with the 1.2 MPI standard, and aims for 2.0 compliance.

Download the program, uncompress it, and install it with ./configure, make, make install. If you want others to use the program, install it in another directory, with the –prefix option set to configure, such as the following:

configure –prefix=/usr/local/mpich 

Typing ./configure –usage gives you all the needed command-line options you need to configure MPICH. The make install command is optional, but it enables you to make mpich public.

Next, set your path to run programs from the mpi/bin directory:

$ export PATH=$PATH:/usr/local/mpi/bin 

Examples of MPI programs to run are in /usr/local/mpi/examples/.

Following is an example of a basic Hello World program that is using MPI:

#include <stdio.h> 
#include "mpi.h" 

int main( argc, argv ) 
int  argc; 
char **argv; 
{
    int rank, size; 
    MPI Init( &argc, &argv ); 
    MPI Comm size( MPI COMM WORLD, &size ); 
    MPI Comm rank( MPI_COMM_WORLD, &rank ); 
    printf( "Hello world from process %d of %d
", rank, size ); 
    MPI Finalize(); 
    return 0; 
} 

Following are a few things to note about the program:

  • MPI init initializes the MPI execution environment. argc is a pointer to the number of arguments, and argv is a pointer to the argument vector.

  • MPI Comm size determines the size of the group associated with a communicator.

  • MPI Comm rank determines the rank of the calling process in the communicator.

  • MPI Finalize terminates the MPI environment.

    After you finish writing your Hello World program, you can link it with the MPI compilers, just as you use a normal compiler, such as gcc. The first example is for C, the second is for Fortran 77:

$ mpicc –o helloworld hello.c 

$ mpif77 –o helloworld hello.f 

Executing the programs can be done with mpirun. Typically, you’ll use

$ mpirun –np 2 helloworld 

where the number 2 can be replaced by any number of processors. Type mpirun –help for a list of options.

So now you’re saying, “That’s all fine and good, but this isn’t really a clustered application, is it?” No, it’s not. Not yet. For that, you’ve got to use either rhosts or ssh to transmit messages over. Enter the list of workstations to be added to /usr/local/mpi/share/machines.<arch>. Just the hostname should be placed in there.

Make sure that the master node (the node with mpi placed on it) has .rhosts rights to run programs on the nodes in the cluster. You can test the mpi connectivity with /usr/local/mpi/sbin/tstmachines <architecture>. This ensures connectivity and makes sure that the mpi programs can run across all nodes in the cluster. The tstmachines attempt to run rsh across nodes. If it can’t access rsh, the test fails. You can substitute Secure Shell instead of rsh by setting the P4_RSHCOMMAND environment variable to ssh in your .profile:

export P4_RSHCOMMAND=ssh 

To set up rsh, you need to create a /etc/hosts.equiv file that’s got both your local host, your hostname, and the master node on all the machines in the cluster, or you can’t connect.

The ssh command is more involved to set up than . rhosts, but hey, it’s secure. You can install it from the instructions given in Chapter 2, “Preparing Your Linux Cluster,” however, be sure to configure it with –rsh=ssh so that MPI uses that instead of rsh. Although .rhosts tends to be insecure, ssh puts a little more load on the CPU because of the fact that it’s got to encrypt the messages across the network. The rsh command doesn’t have those limitations, but then again, it’s insecure. However, you might not need to secure your transmissions across an already private network.

Other libraries and programs that use MPI are being built all the time. Numerical libraries for the parallel solution of sparse liner systems and nonlinear equations can be found, as well as Graphics libraries, multi-dimensional algebraic libraries, and thermal fluid simulations.

Summary

Message passing libraries (MPLs), such as PVM and MPI, allow programmers a standardized interface to write portable code across heterogeneous machines. Such code is designed to enable the programmer to access all the nodes in the cluster to solve large computational problems.

With a solid knowledge of C or Fortran, the programming extensions aren’t much different from the single processor, single machine environments. For a decent environment, you can either choose to use a dedicated system, such as Scyld Linux, to run your clustered environment, or set up a set of .rsh or ssh to run off your distribution and include a message-passing interface.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.134.17