Component tour
In this chapter, we describe the client and server components that comprise the IBM Parallel Environment Developer Edition. We also describe the IBM Parallel Environment Runtime Edition, which is a companion to the Developer Edition.
This chapter contains the following sections:
2.1 Parallel Environment Runtime Edition components
IBM Parallel Environment (PE) Runtime Edition is a capability-rich development and execution environment for parallel applications. IBM PE Runtime Edition offers parallel application programming interfaces and execution environment for parallel applications.
Parallel Environment Runtime Edition includes the following components:
The Parallel Operating Environment (POE) for submitting and managing jobs.
The IBM MPI, PAMI, and LAPI libraries for communication between parallel tasks.
A parallel debugger (pdb) for debugging parallel programs.
2.1.1 Parallel Operating Environment (POE)
The IBM Parallel Operating Environment (POE) enables users to develop and execute parallel applications across multiple operating system images (nodes). POE includes parallel application compile scripts for programs written in C, C++, and Fortran, and a command-line interface to submit commands and applications in parallel. POE also provides an extensive set of options and additional functions to fine-tune the application environment to suit the execution of the application and system environment.
POE is used to set up the environment for the user’s parallel program and to control and monitor job execution.
The Parallel Operating Environment provides:
OS Jitter mitigation through co-scheduling capability: POE provides services for periodically adjusting the dispatch priority of a user’s task between set boundaries, giving the tasks improved execution priority. As an alternative to the priority adjustment coscheduler, POE provides a run queue-based coscheduler for reducing operating system Jitter (OS Jitter). OS Jitter is operating system interference that is caused by the scheduling of daemon processes and the handling of asynchronous events, such as interrupts. This interference can have adverse effects on parallel applications running on large-scale systems.
Affinity support of processes to CPU and memory: IBM PE Runtime Edition provides the capability to control the placement of the tasks of a parallel job using affinity to process memory or CPU placement during its execution. As a result, applications might see improved performance if the processor, the memory it uses, and the I/O adapter it connects to are in close proximity based on the affinity of the tasks to memory or CPUs it uses.
User-controlled workflow/subjob support: IBM PE Runtime Edition provides support for launching and managing multiple parallel dynamic jobs (subjobs) using a single scheduler or resource management allocation of cluster resources.
Lightweight core file support: Lightweight core file is designed to save CPU time, network bandwidth, and disk space that is required to generate standard core files. The lightweight format also provides the capability to easily examine the state of all threads in a parallel program at the time the event that caused the core file occurred.
Serial and parallel job launch: Allows POE to be used not only for launching traditional MPI or other message passing programs, but also as a distributed shell to quickly obtain information about all nodes in the cluster, such as disk space, currently running jobs, and so on.
Support running SPMD and MPMD programs: Single Process, Multiple Data (SPMD), Multiple Process, and Multiple Data (MPMD) gives users the flexibility to run the same program on all nodes (SPMD, most common) or different programs on each node (MPMD). The MPMD function is useful for master/worker programs where the master program coordinates and synchronizes the execution of all worker tasks, where neither program can run without the other.
Resource management: POE supports running without a separate scheduler or resource manager. If a scheduler or resource manager is not used or not available, POE can manage the node and adapter resources itself. The network resource tables is also loaded by POE to establish the communication mechanism needed for message-passing programs.
You can use the resource manager of your choice for submitting and managing batch or interactive parallel jobs. IBM PE Runtime Edition includes a set of resource management interfaces and data areas for configuring your resource manager to interact with POE.
Integration with IBM Tivoli® Workload Scheduler LoadLeveler: IBM PE Runtime Edition allows users to run POE jobs in interactive or batch mode with LoadLeveler managing the node, network, CPU, memory, and other key resources for optimum throughput and resource utilization.
2.1.2 IBM Message Passing Interface (IBM MPI)
The IBM MPI is a complete MPI 2.2 implementation, designed to comply with the requirements of the MPI standard. IBM MPI supports the MPI-2.1 process creation and management scheme. The IBM design is enabled using static resources allocated at job launch time.
The IBM MPI provides a number of nonblocking collective communications subroutines that are available for parallel programming. These subroutines are extensions of the MPI standard. Collective communications routines for 64-bit programs were enhanced to use shared memory for better performance. The IBM MPI collective communication is designed to use an optimized communication algorithm according to job and data size.
The IBM MPI provides a high scalability and low memory usage implementation. The IBM MPI library minimizes its own memory usage so that an application program can use as much system resources as possible. It is architected to support parallel job size of up to one million tasks.
IBM MPI runs over PAMI. This is achieved through exploiting the PAMI APIs.
 
Note: For Linux, PE Runtime Edition also includes an MPICH2 MPI implementation that can be used as an alternative to the IBM MPI implementation.
2.1.3 IBM Parallel Active Messaging Interface (PAMI)
IBM PAMI is a converged messaging API that covers both point-to-point and collective communications. PAMI exploits the low-level user space interface to the Host Fabric Interface (HFI) and TCP/IP using UDP sockets.
PAMI has a rich set of collective operations designed to support MPI and pGAS semantics, multiple algorithm selection, and nonblocking operation. It supports nonblocking and ad hoc geometry (group/communicator) creation and nonblocking collective allreduce, reduce, broadcast, gather(v), scatter(v), alltoall(v), reduce scatter, and (ex)scan operations. The geometry can support multiple algorithms, including hardware-accelerated (through HFI Collective Acceleration Unit, or Barrier Service Register) versions of broadcast, barrier, allreduce, and reduce.
2.1.4 Low-level application programming interface (LAPI)
The low-level application programming interface (LAPI) is a message-passing API that provides a one-sided communication model. In this model, one task initiates a communication operation to a second task. The completion of the communication does not require the second task to take complementary action.
The LAPI library provides basic operations to “put” data to and “get” data from one or more virtual addresses of a remote task. LAPI also provides an active message infrastructure. With active messaging, programmers can install a set of handlers that are called and run in the address space of a target task on behalf of the task originating the active message. Among other uses, these handlers can be used to dynamically determine the target address (or addresses) where data from the originating task must be stored. You can use this generic interface to customize LAPI functions for your environment.
Some of LAPI’ s other general characteristics include:
Flow control
Support for large messages
Support for generic non-contiguous messages
Non-blocking calls
Interrupt and polling modes
Efficient exploitation of interconnect functions
Even monitoring support (to simulate blocking calls, for example) for various types od completion events
LAPI is meant to be used by programming libraries and by power programmers for whom performance is more important than code portability.
MPI and LAPI provide communications between parallel tasks, enabling application programs to be parallelized.
MPI provides message passing capabilities that enable parallel tasks to communicate data and coordinate execution. The message passing routines call communication subsystem library routines to handle communication among the processor nodes.
LAPI differs from MPI in that it is based on an active message style mechanism that provides a one-sided communications model in which one process initiates an operation and the completion of that operation does not require any other process to take a complementary action. LAPI is also the common transport layer for MPI and is packaged as part of the AIX RSCT component.
2.1.5 Command line parallel debugger (pdb)
The parallel debugger (pdb) streamlines debugging of parallel applications, presenting the user with a single command line interface that supports most dbx/gdb execution control commands and provides the ability to examine running tasks. To simplify management of large numbers of tasks, dbx/gdb allows tasks to be grouped so that the user can examine any subset of the debugged tasks.
The pdb allows users to invoke a POE job or attach to a running POE job and place it under debug control. It starts a remote dbx/gdb session for each task of the POE job put under debugger control.
The pdb provides these advance features:
Dynamic tasking support
Multiple console display
Output filtering
2.2 Parallel Environment Developer Edition components
IBM Parallel Environment Developer Edition is an Eclipse-based integrated set of application development tools that will help you develop, debug, and tune your parallel applications. It includes a set of standard Eclipse components and additional support for IBM environments.
2.2.1 Eclipse, PTP, CDT and Photran
IBM PE Developer Edition is based on the Eclipse 4.2 (Juno) platform and includes the following open-source components:
PTP
The Parallel Tools Platform (PTP) is an Eclipse-based application development environment that contains an integrated set of tools to help you edit, compile, run, debug, and analyze your parallel application written in C, C++, and Fortran. Advanced tools included with PTP include static analysis tools to locate errors before the code is compiled, refactoring tools to modify code while preserving behavior, and an integrated parallel debugger. PTP supports a broad range of architectures and job schedulers and provides the ability to easily add support for additional systems.
PTP also provides:
Support for MPI, OpenMP, OpenACC, and UPC parallel programming models
Support for a wide range of batch systems and runtime systems, including IBM LoadLeveler, IBM Parallel Environment, Open MPI, and MPICH2
IBM PE Developer Edition includes additional support for PAMI, LAPI, and OpenSHMEM libraries.
CDT
The C/C++ Development Tooling (CDT) provides a fully functional C and C++ IDE based on the Eclipse platform. Features include:
Support for project creation and managed build for various toolchains
Standard make build
Source navigation
Various source knowledge tools, such as type hierarchy
Call graph
Include browser
Macro definition browser
Code editor with syntax highlighting
Folding and hyperlink navigation
Source code refactoring and code generation
Visual debugging tools, including memory, registers, and disassembly viewers
Photran
Photran is an IDE and refactoring tool for Fortran based on Eclipse and the CDT. Features include:
Refactorings, such as rename, extract procedure, and loop transformations
Syntax-highlighting editor
Outline view
Content assist
Open declaration
Declaration view and hover tips
Fortran language-based searching
Interactive debugger (gdb GUI)
Makefile-based compilation
Optional makefile generation
Recognition of error messages from most popular Fortran compilers
2.2.2 IBM specific add-ons in the IBM PE Developer Edition
This section describes IBM specific add-ons in the IBM PE Developer Edition.
IBM HPC Toolkit
The IBM PE Developer Edition also includes the IBM HPC Toolkit (HPCT), which is a collection of tools that you can use to analyze the performance of parallel and serial applications that are written in C or Fortran, running the AIX or Linux operating systems on IBM Power Systems servers. Applications running on RedHat Enterprise Linux 6 on IBM System x with the Intel microarchitecture codename Nehalem, Westmere, and Sandy Bridge family of processors are also supported. The Xprof GUI also supports C++ applications. These tools perform the following functions:
Provide access to hardware performance counters for performing low-level analysis of an application, including analyzing cache usage and floating-point performance.
Profile and trace an MPI application for analyzing MPI communication patterns and performance problems.
Profile an OpenMP application for analyzing OpenMP performance problems and to help you determine if an OpenMP application properly structures its processing for best performance.
Profile application I/O for analyzing an application’s I/O patterns and whether you can improve the application’s I/O performance.
Profile an application’s execution for identifying hotspots in the application and for locating relationships between functions in your application to help you better understand the application’s performance.
The IBM HPC Toolkit provides three primary interfaces. The first is the IBM HPC Toolkit Eclipse plug-in. This plug-in is an extension to the Eclipse IDE that you can use to run hardware performance counter analysis, MPI profiling and tracing, OpenMP profiling, and I/O profiling. The plug-in allows you to select the parts of your application that are to be instrumented, instrument those parts of the application, run the instrumented application, and view the resulting performance data. The plug-in allows you to sort and filter the data to help you find the performance problems in the application.
The second interface is peekperf, which is an AIX or Linux-based GUI that you can use to run hardware performance counter analysis, MPI profiling and tracing, OpenMP profiling, and I/O profiling. Peekperf allows you to select the parts of your application that are to be instrumented, instrument those parts of the application, run the instrumented application, and view the resulting performance data. Peekperf allows you to sort and filter the data to help you find the performance problems in the application.
The third interface is Xprof, which you can use to view low-level profiling data for your application. Xprof allows you to view the performance data in gmon.out files, generated by compiling your application using the -pg compiler flag. You can view the profiling data, identify hotspots in the application, view relationships between functions in the application, zoom into areas of the application of greater interest, and sort and filter the data to help identify hotspots in the application.
The IBM HPC Toolkit also provides the hpcInst utility, which you can use to instrument the application without using the peekperf GUI. You can specify the types of instrumentation you want to use and the locations within the application that are to be instrumented. The hpcInst utility rewrites the application binary with the instrumentation you selected. Then, you can run the instrumented executable to obtain the same types of performance measurements that you can using peekperf.
Finally, the IBM HPC Toolkit provides commands to get an overview of hardware performance counters for an application and libraries that allow you to control the performance data obtained using hardware performance counters and by MPI profiling and tracing.
On x86 class machines running Linux, only the hardware performance counter tool, the MPI profiling tool, and the I/O profiling tool within the IBM HPC Toolkit are supported. The only supported model for these tools is where the application is linked with the supporting IBM HPC Toolkit runtime libraries. Accessing the hardware performance counters on x86 class machines is only supported on the Intel microarchitecture Nehalem, Westmere, and Sandy Bridge families of processors. There is no support for dynamic instrumentation using the hpcInst command, the peekperf GUI, or the IBM HPC Toolkit plug-in for Eclipse.
IBM PAMI and LAPI user assistance features
IBM PE Developer Edition provides features that augment the C/C++ Eclipse editor for ease of using PAMI and LAPI APIs, locating them in the source code, getting information about API usage and arguments, and so on.
IBM XLC and XLF compiler transformation report feedback viewer
IBM PE Developer Edition provides a source-linked view of items identified by the IBM XLC/XLF Compiler Transformation reports.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.146.176.88