Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

12 Advanced parallelism with teams, events, and collectives

This chapter covers

Forming teams of parallel images for different tasks
Synchronizing execution by posting and waiting for events
Exchanging data across images using collectives

Parallel programming is ubiquitous in many applications in science and engineering, such as aerodynamics, weather and ocean prediction, and machine learning. Parallel programming lets you distribute work between many CPUs, allowing the program to finish sooner. Distributing the work also reduces the amount of memory needed by the program, so parallelism allows running large programs that otherwise wouldn’t fit into the memory of a single computer. Fortran is natively parallel, which means that the syntax used to express parallel programs is built into the language itself.

In chapter 7, your first foray into parallel Fortran programming was through coarrays. They allowed you to distribute the work among multiple CPUs, exchange data between them, and perform the computations faster. In this chapter, we’ll take it a step further and explore three new parallel concepts: teams, events, and collectives. We’ll use these new features toward the final implementation of the tsunami simulator that we’ve been developing in this book.

Teams and events provide advanced means for controlling program flow and synchronization. Collectives allow you to implement common parallel patterns across images without directly invoking coarrays. At the end of the chapter, you’ll walk away with the working knowledge to implement advanced parallel patterns in Fortran from scratch, or use them to augment an existing Fortran application. Together, images, coarrays, teams, events, and collectives provide a comprehensive toolbox to express any parallel algorithm that you can think of. This chapter will show you how.

12.1 From coarrays to teams, events, and collectives

Chapter 7 introduced the parallel programming concepts in Fortran, including images, synchronization, and data exchange using coarrays. I strongly recommend that you read that chapter before starting this one. Nevertheless, let’s refresh our memory on these concepts before we build further on them.

Fortran refers to any parallel process as an image. Under the hood, an image can be a process running on a dedicated CPU core or a thread implemented by the operating system. A parallel Fortran program runs on all images, and each image loads its own copy of the program in RAM. The built-in functions this_image and num_images are available. The former returns the number of the current image, and the latter returns the total number of images that are running the program. Each image runs the program independently from all other images until they’re synchronized using the sync all statement. These concepts allow us to inquire about images and synchronize them. However, they don’t help us regarding exchanging data between images. To do this, Fortran has a special data structure called a coarray. A coarray can be coindexed to access data on remote images--we can copy data to and from other images by indexing a coarray with the target image number.

Teams, events, and collectives build directly on these concepts. Teams let you separate groups of images by different roles, while events make communicating status updates between teams (or just images) simple. Consider a weather prediction model, for example. The simulation can’t start without the initial data coming in, and the team that writes data to disk needs to wait for the simulation team to finish their part of the job. Posting and waiting for events from different teams is how we can synchronize them. Finally, collectives will allow you to perform common parallel calculations, such as sum, minimum, or maximum, without directly invoking coarrays.

As we work on implementing these features in the tsunami simulator, we’ll focus mainly on monitoring the time stepping progress of the simulation and extracting some useful statistics about the simulated water height field. Although a real-world application is likely to employ teams, events, and collectives for more complex tasks, such as downloading and processing remote data, writing model output to disk, and serving data to clients, focusing on a simple and minimal task will help us learn and better understand in detail how these features work.

In case you opened this chapter before working through the earlier ones in the book, make sure you have your Fortran compiler ready to build parallel code. You’ll need recent builds of the GNU Fortran compiler (gfortran) and the OpenCoarrays library. Refer to appendix A for instructions on setting up gfortran and OpenCoarrays. Otherwise, if you’re working on a system with access to Intel or Cray Fortran compilers, you’re good to go. In that case, specific compile commands and options will be a bit different than presented here. Refer to user documentation of your Fortran compiler for help on how to use it.

12.2 Grouping images into teams with common tasks

Fortran 2018 introduced teams to allow the programmer to assign different tasks to groups of images. For example, if you’re computing a weather simulation on 16 images, you could assign them different roles (figure 12.1).

Figure 12.1 A weather model workflow, with parallel images distributed in different teams and each box with a number in it representing one image

In this specific example, the images are distributed in the following setup:

One image queries a remote server and downloads satellite data when available.
Another is in charge of monitoring the progress of the simulation and logging appropriate information to a text file.
Two images are responsible for writing simulation output files to disk.
The remaining 12 images are churning away with the heavy task of simulation, without getting distracted by other chores.

Let’s apply a subset of this pattern to the tsunami simulator we’ve been developing.

12.2.1 Teams in the tsunami simulator

In this section, we’ll use teams to augment our tsunami simulator and assign different roles to parallel images working concurrently. For brevity and to not get bogged down in the details of what the specific roles could be in real-world simulation software, we’ll create only two teams: the compute team and the logging team. While the compute team is churning away at the heavy task of number-crunching, the logging team will monitor and report the progress of the simulator. Logging is a relatively lightweight task, so we’ll assign only one image to the logging team, and the rest will go to the compute team. Thus, if we run the program on four parallel images, one will be logging progress, while the remaining three will be crunching numbers. This is a simplified variant of the approach illustrated in figure 12.1.

The updated tsunami program that uses teams will look as shown in listing 12.1. This listing shows only the added code relative to where we left off with the tsunami simulator in chapter 10. Don’t worry about coding this up just yet; here, I’m merely giving you an overview of what’s coming later in the chapter.

Listing 12.1 Introducing teams to the tsunami simulator

program tsunami
 
  use iso_fortran_env, only: team_type    ❶
  ...
  type(team_type) :: new_team             ❷
  integer :: team_num                     ❸
  ...
  team_num = 1                            ❹
  if (this_image() == 1) team_num = 2     ❺
  form team(team_num, new_team)           ❻
 
  change team(new_team)                   ❼
    if (team_num == 1) then
      ...                                 ❽
    else if (team_num == 2) then
      ...                                 ❾
    end if
  end team                                ❿
 
end program tsunami

❶ Imports team_type from the iso_fortran_env module

❷ Declares a new team_type instance

❸ Team number variable that we’ll use to identify sibling teams

❹ All images will go to team 1 by default.

❺ Only the first image will go to team 2.

❻ Forms two new teams

❼ Changes the current team for each image

❽ The original simulator code is assigned to team 1.

❾ The logging code for team 2 goes here.

❿ Closes the change team construct

This listing summarizes the concepts of forming new teams and switching the execution context between them. First, a team is modeled using a new built-in type, team_ type, available from the the iso_fortran_env module. To begin working with teams, we import team_type and declare an instance of it, in this case new_team. We also need a unique integer scalar to refer to different teams by their number, in this case team_num. This variable is used to assign images to different teams. In the form team statement in this example, we assign all images to team 1, except the first one, which we assign to team 2. The form team statement only creates new teams; it doesn’t affect the execution.

This is where the change team construct comes in--it instructs all images that execute it to switch to a new team--in this case, new_team. Note that change team is a construct, like an if block or a do loop, and is paired with a matching end team statement.

Within the change team construct, the images are now running in their new teams. We can assign code to be executed to each team by checking the value of the team number. Teams will work on different tasks, and will also need to synchronize and exchange data from time to time.

Figure 12.2 illustrates this process, albeit with a bit different team organization.

The key concepts introduced here are forming new teams (form team statement) and changing the current team (change team construct). The form team statement creates new teams and encodes the information about which image on the current team will belong to which new team. The change team construct moves images to the newly created teams. Within the change team construct, the images have new image numbers assigned to them. Teams can work independently from one another, synchronize, and even exchange data.

You may also wonder why we need separate statements for forming and changing teams. We need them because these two operations are fundamentally different in nature: form team instructs the compiler to define new teams and assign images to them, analogous to defining a new function; change team, on the other hand, switches the execution context between already created teams, which is analogous to calling a function. Don’t worry if this seems like a lot and not everything is clear yet. We’ll go over each element in detail as we work through this section.

Figure 12.2 Forming and changing teams

Note In case you’re familiar with MPI programming (discussed earlier in the book), whether in C or Fortran, teams are analogous to MPI communicators.

12.2.2 Forming new teams

Before we dive into the implementation of teams in the tsunami simulator, let’s look at the syntax of forming new teams, which will apply to any parallel program that uses them. In the beginning of the program, there’s only one team, and we’ll refer to it as the initial team. All images that run the program start in the initial team by default. If you intend to work with teams at all, the first thing you’ll do is form new teams within the initial team using the form team statement. You can make as many new teams as you want. In this example, we’ll create two new teams--one for the first half of all images and the other for the rest--as shown in the following listing.

Listing 12.2 Forming two new teams with equal numbers of images

program form_team
 
  use iso_fortran_env, only: team_type                  ❶
  implicit none
 
  type(team_type) :: new_team                           ❷
  integer :: team_num                                   ❸
 
  team_num = 1                                          ❹
  if (this_image() > num_images() / 2) team_num = 2     ❺
 
  form team(team_num, new_team)                         ❻
 
end program form_team

❶ Imports team_type from the iso_fortran_env module

❷ Declares a new team_type instance

❸ Team number variable that we’ll use to identify sibling teams

❹ All images will go to team 1 by default.

❺ The rest will go to team 2.

❻ Forms two new teams

Besides the basic housekeeping, like importing the team_type from the iso_fortran _env module and declaring the team and team number variables, there are two key elements here. First, we decide how many new teams to create and which images will go to each team. We do this by assigning values to the integer variable team_num on every image. Second, we execute the form team statement, which creates new teams and internally assigns the images to them. If you compile and run this program, there will be no output. This is expected, as a form team statement on its own doesn’t emit any output.

A form_team statement must be executed by all images on the current team. The first form team statement in the program is thus always executed by all images in the program. This statement also synchronizes all the images on the team, implying a sync all under the hood. (See chapter 8 for a refresher on synchronizing images.) This is the syntax of the form team statement:

form team(team_num, team_variable[, new_index, stat, errmsg])

where

team_num is a positive, scalar, integer constant or expression that uniquely identifies the team to be created.
team_variable is a scalar variable of type team_type.
new_index is a scalar integer that allows you to specify an image number that this image will have on the new team.
stat is the integer status code, with a zero value in case of success and nonzero otherwise
errmsg is the character string with an informative error message, if stat returns a nonzero value.

team_num and team_variable are required input parameters. The value of team_num across images determines how many teams will be created with the form team statement and which image will belong to which new team. If multiple new teams are created, their numbers don’t need to be contiguous, but they need to be unique positive integers. new_index is an optional input parameter that you can use to specify the number of the image on the new team, which is otherwise compiler-dependent. If provided, the values of new_index must be unique and less than or equal to the number of images being assigned to the new team. stat and errmsg, both optional output parameters, have the same meaning and behavior as they do in the allocate and deallocate statements in chapter 5. As you’ll see throughout the remainder of this chapter, all parallel features introduced by Fortran 2018 have error handling built in.

12.2.3 Changing execution between teams

Now that we have two new teams, how do we instruct images on each team to do certain kinds of work? Recall that by default, all images start on the same team. We need to switch each image to a new team to get it to work on a different task. We do this with the change team construct. Following the form_team statement in listing 12.2, we’ll add this snippet:

change team(new_team)                                       ❶
  print *, 'Image', this_image(), 'of', num_images(), &     ❷
           'is on team', team_number()                      ❷
end team                                                    ❸

❶ Switches execution to a new team

❷ Reports the image and team number

❸ Returns to the parent team

Now you understand why change team is a construct--every change team statement must be paired with a matching end team statement. A change team statement instructs all images that execute it to switch to the team specified in parentheses. The code inside the change team construct executes on the new (child) team until the end team statement, when the images return to the original (parent) team. Similar to this_image, which returns the image number, team_number returns a scalar integer value of the current team.

Let’s save this program in a file called change_team.f 90, compile it, and run it on five images:

caf change_team.f90 -o change_team                          ❶
cafrun -n 5 --oversubscribe ./change_team                   ❷
  Image           1 of           2 is on team          1    ❸
  Image           2 of           2 is on team          1    ❸
  Image           1 of           3 is on team          2    ❸
  Image           2 of           3 is on team          2    ❸
  Image           3 of           3 is on team          2    ❸

❶ Compiles the program using the OpenCoarrays compiler wrapper

❷ Runs the program on five parallel processes

❸ The output of the program

What’s happening here? Each image prints three numbers to the screen: its own image number (this_image()), the total number of images (num_images()), and its team number (team_number()). Let’s look at the values in reverse, from right to left. First, we see that there are two images on team 1 and three on team 2. This is what we expected, as we instructed form team to first assign two images (out of five total) to team 1, and the rest to team 2. So far, so good. Second, notice that two of the images report a total number of images of 2, and the remaining three report a total number of images of 3. This means that when executed within the change team construct, num_images() now doesn’t represent the total number of images running the whole program, but the total number of images within the current team. Finally, looking at the current image number, it seems that our original images 3, 4, and 5 now have numbers 1, 2, and 3 on their new team. Conclusion: when executed within the change team construct, functions this_image and num_images operate in the context of the current team.

Note that the Fortran Standard doesn’t prescribe what the new image numbers on the newly formed teams will be, and leaves the numbering of images on new teams as implementation- (compiler-) dependent. If you need to ensure specific image indices on new teams (or preserve the ones from the initial team), use the new_index argument in the form team statement, described in the previous subsection.

The syntax for the change team construct is

[name:] change team(team_value[, stat, errmsg])     ❶
  ...                                               ❷
end team [name]                                     ❸

❶ Switches all images to the new team with team_value

❷ All code here is executed in the context of the new team.

❸ Synchronizes images and returns them to the original team

where

team_value is an input scalar variable or expression of type team_type.
stat and errmsg are optional output parameters that have the same meaning as in the form team statement.
name is an optional label for the construct, much like a labeled do loop.

At the beginning of a change team construct, all images that execute it switch to the team provided in parentheses. Inside the construct, these images execute within the new team. When they reach the end team statement, the images automatically synchronize and return to the original (parent) team that they were on immediately before the change team statement.

Write a parallel program that models a tribe of hunter-gatherers using teams. Form the teams such that this is how they operate:

Gatherers comprise 1/2 of all villagers, and they go foraging for fruit and vegetables. When they reach their destination, they split into teams of 2 for more efficient foraging.
Hunters comprise 1/3 of all villagers, and they go hunting. When they reach their destination, they split into teams of 3 for more efficient hunting.
The remaining 1/6 of villagers are elders, who stay together in the village and rest by the fire pit.

For this exercise, make each team report to the screen:

How many villagers are in each team
When they leave the village
When they engage in an activity

Hint: use a form team statement within a change team construct to create new teams within teams.

The solution to this exercise is given in the “Answer key” section near the end of the chapter.

12.2.4 Synchronizing teams and exchanging data

We’ve learned so far, both from coarrays in chapter 7 and from developing the parallel tsunami simulator, that synchronizing images is crucial for writing correct parallel programs. Recall that when we have data dependency between parallel images, one image must wait for data from another image before proceeding with its own calculation. This subsection explains how synchronization of images works within teams, and how to synchronize multiple teams as a whole.

Synchronizing images within a team

The essential synchronization mechanism you learned in chapter 7 was the sync all statement, which placed a barrier in the code at which every image had to wait for all others before proceeding. At the point of a sync all statement, we considered all images to be synchronized. Another option that’s available to us, when we need to synchronize the current image with some but not all other images, is the sync images statement. For example, we used sync all in the sync_edges method of the Field type in the tsunami simulator (see section 10.4) to synchronize every image with all other images. Using sync images, we can instead synchronize each image only with its four neighbors, in mod_field.f 90, subroutine sync_edges:

...
sync images(set(neighbors))                                ❶
 
edge(1:je-js+1,1)[neighbors(1)] = self % data(is,js:je)    ❷
edge(1:je-js+1,2)[neighbors(2)] = self % data(ie,js:je)    ❷
edge(1:ie-is+1,3)[neighbors(3)] = self % data(is:ie,js)    ❷
edge(1:ie-is+1,4)[neighbors(4)] = self % data(is:ie,je)    ❷
 
sync images(set(neighbors))                                ❸
 
self % data(is-1,js:je) = edge(1:je-js+1,2)                ❹
self % data(ie+1,js:je) = edge(1:je-js+1,1)                ❹
self % data(is:ie,js-1) = edge(1:ie-is+1,4)                ❹
self % data(is:ie,je+1) = edge(1:ie-is+1,3)                ❹
...

❶ Synchronizes with neighbors before copy into buffer

❷ Copies data into the coarray buffer, edge

❸ Synchronizes with neighbors again before copying out of buffer

❹ Copies data from coarray buffer into the field array

The same behavior holds in the context of teams: sync all and sync images statements now operate within the team in which they’re executed. For example, if you have two teams and you’ve switched the images to them using the change team construct, issuing sync all synchronizes the images within each team, but not the teams themselves. Ditto for sync images. Although this may be confusing at first, you’ll get used to it over time as you practice working with teams. Just remember: sync all and sync images statements always operate only within the current team and can’t affect the images outside of the team. In the next subsection, you’ll see how you can synchronize between teams.

In the sync images snippet, set(neighbors) ensures that we pass unique values of neighbors to sync images. We’ll define set in the same module in mod_field.f 90, as shown in the following listing.

Listing 12.3 Function set to return unique elements of an array

pure recursive function set(a) result(res)                ❶
  integer, intent(in) :: a(:)
  integer, allocatable :: res(:)
  if (size(a) > 1) then
    res = [a(1), set(pack(a(2:), .not. a(2:) == a(1)))]   ❷
  else
    res = a
  end if
end function set

❶ The recursive attribute allows a function to call itself.

❷ Eliminates nonunique elements from the array, one at a time

This is the first time we encounter the recursive attribute. This attribute allows a function or subroutine to invoke itself. The crux of this function is in the fifth line of the listing, where we recursively reduce the array by removing duplicate elements, one by one, using the built-in function pack. For a refresher on pack, see section 5.4, where we used it for the first time. Note that Fortran 2018--the latest iteration of the language as of this writing--makes all procedures recursive by default, so specifying the recursive attribute won’t be necessary anymore. I still include it here because most Fortran compilers have yet to catch up with this recent development.

Synchronizing whole teams

Having established that sync all and sync image statements operate only within the current team and can’t affect the images outside of it, we need a mechanism to synchronize between the teams. Back to our working tsunami example from listing 12.1, where we began incorporating teams for the simulation and logging tasks:

change team(new_team)
  if (team_num == 1) then
    ...                          ❶
  else if (team_num == 2) then
    ...                          ❷
  end if
end team

❶ Simulation code goes here.

❷ Logging code goes here.

As logging depends on the data from the simulation team, we need a way to synchronize images between different teams. This is where the new sync team statement comes in, as shown in the following listing.

Listing 12.4 Synchronizing images within the initial team using the sync team statement

use iso_fortran_env, only: initial_team, team_type     ❶
...
change team(new_team)
  if (team_num == 1) then
    ...                                                ❷
    sync team(get_team(initial_team))                  ❸
  else if (team_num == 2) then
    sync team(get_team(initial_team))                  ❸
    ...                                                ❹
  end if
end team

❶ Imports the initial_team constant from the module

❷ Simulation code

❸ Synchronizes with all images that belong to the initial team

❹ Logging code

sync team has been introduced to the language to allow synchronizing images within the parent team without leaving the change team construct. To use it, we need to provide it a team value over which to synchronize. In practice, this will typically be a parent team or some other ancestor team (see the “Exercise 1” sidebar for an example of multiple levels of teams), but can also be the current team or the child team. To refer to a team such as the initial team, which we never defined as a variable, we use the get _team built-in function, and pass it the initial_team constant available from the iso_fortran_env module. Besides the initial_team integer constant, iso_fortran _env also provides the parent_team and current_team constants.

For brevity, we won’t get bogged down with the exact code that the logging team will execute. In practice, it could be monitoring the time stepping progress of the simulation team, checking and processing files written to disk, printing simulation statistics to the screen, and perhaps even serving them as a web server. An important element to most of these activities is getting the data from the simulation team.

Exchanging data between teams

I mentioned in the previous subsection that one of the activities the logging team could be performing is monitoring the time stepping of the simulation team. If they’re operating independently and concurrently, how can the logging team know each time the simulation team steps forward? To demonstrate the exchange of data between teams, let’s send the time step count from the simulation team to the logging team. To do this, we’ll make our time step count variable a coarray, and we’ll use the team number in the image selector when referencing that coarray, as shown in the following listing.

Listing 12.5 Exchanging data between teams using image selectors

integer(ik) :: time_step_count[*]                                          ❶
...
change team(new_team)
  if (team_num == 1) then
    ...
    time_loop: do n = 1, num_time_steps
      ...
      time_step_count[1, team_number=2] = n                                ❷
    end do time_loop
  else if (team_num == 2) then
    n = 0
    time_step_count = 0
    do                                                                     ❸
      if (time_step_count > n) then                                        ❹
        n = time_step_count
        print *, 'tsunami logger: step ', n, 'of', num_time_steps, 'done'
        if (n == num_time_steps) exit                                      ❺
      end if
    end do
  end if
end team

❶ Declares time step count as a coarray

❷ Copies n into time_step_count on image 1 of team 2

❸ Loops indefinitely

❹ Runs this code if time_step_count has been updated

❺ Leaves the loop if we’ve reached the end

In listing 12.5, we’ve declared the time_step_count integer coarray, which we’ll use to exchange the time step count between the simulation team and the logging team. To send the data, we’ll use the usual coarray indexing syntax from chapter 7, with a twist: here, we also specify the team number in the image selector (the values between square brackets). When we write time_step_count[1, team_number=2] = n, we’re saying “Copy the value of n into the time_step_count variable on image 1 of team number 2.” This means that the image number is relative to the team in question--image 1 on team 1 is different from image 1 on team 2. On the logging team, we initialize the local value of time_step_count to zero, loop indefinitely, and check for its value in each iteration. Every time time_step_count is incremented by the simulation team, we print its value to the screen.

While this is a somewhat trivial example--printing a single integer to the screen is not that much work--it illustrates how to effectively offload heavy compute work to other teams. In a real-world app, while the simulation team is busy crunching numbers, one team could be writing the output files to disk, while another could be serving them as a web server. The results of the tsunami simulator won’t change with introduction of teams into the code, because they affect only how the code and its order of execution are organized. The simulation part of the code, which is responsible for producing numerical results, is now running in its dedicated team rather than on all images. While teams don’t necessarily unlock any new capability relative to original image control and synchronization mechanisms, they allow you to more cleanly express distribution of work among images. This becomes especially important for larger, more complex apps.

12.3 Posting and waiting for events

In the previous section, we used teams to distribute work among groups of images. Teams allow us to express some parallel patterns and synchronization more elegantly than we otherwise could by controlling individual images directly. Fortran 2018 introduces another new parallel concept called events, provided through the built-in derived type called event_type. In a nutshell, you can post events from one or more images, and query or wait for those events from others. Figure 12.3 illustrates how events are implemented in Fortran.

You can read this diagram in any order. An alert event is an instance of event_ type. Image 1 triggers the alert on image 2 by issuing event post(alert[2]). This statement is nonblocking, which means that image 1 can immediately move on with whatever code follows. All instances of event_type keep a count of posted events internally. This count is incremented on every event post statement, from any image. Image 2 issues event wait(alert). This is a blocking statement, which means that image 2 will wait until the alert is posted. When it finally happens, event wait decrements the internal event count. Alternatively, image 2 can also poll the number of alerts in a nonblocking fashion with the built-in subroutine event_query.

Figure 12.3 Fortran events, where solid and dashed arrows denote blocking and nonblocking operations, respectively

That’s all there is to it! Let’s first tinker with posting and waiting for events in an example of sending a notification, and then we’ll dive into the syntax and rules of events.

12.3.1 A push notification example

In this section, we’ll build from our tsunami teams example and use events to post updates from the simulation team to the logging team about data being written to disk. While this is technically doable with coarrays alone, you’ll see that events are a perfect candidate for such parallel patterns. Before we jump back into the tsunami, let’s see how events work from a simple push notification example.

Sending a notification from one process to another will be important in any scenario in which you have data dependency between processes. Examples include a long-running data mining job by a worker process, waited on by a process whose role is to write a report for the user (see figure 12.1), or waiting for data to become available on a remote server.

This example will demonstrate using events to wait for another image to complete a long-running job. It doesn’t matter what the actual job is--here, we’ll emulate it by making the image wait for five seconds. When the time is up, the image will send a notification to another image that’s waiting for it. The following listing shows the complete program.

Listing 12.6 A push notification example using events

program push_notification
 
  use iso_fortran_env, only: event_type                             ❶
  implicit none
  type(event_type) :: notification[*]                               ❷
 
  if (num_images() /= 2) error stop &                               ❸
    'This program must be run on 2 images'                          ❸
 
  if (this_image() == 1) then
    print *, 'Image', this_image(), 'working a long job'
    call execute_command_line('sleep 5')                            ❹
    print *, 'Image', this_image(), 'done and notifying image 2'
    event post(notification[2])                                     ❺
  else
    print *, 'Image', this_image(), 'waiting for image 1'
    event wait(notification)                                        ❻
    print *, 'Image', this_image(), 'notified from image 1'
  end if
 
end program push_notification

❶ Imports event_type from the built-in module

❷ Declares an instance of event_type as a coarray

❸ Requires running on two images

❹ Simulates a long job by waiting for five seconds

❺ Posts the event to notification on image 2

❻ On image 2, waits for notification

First, we import event_type and declare a coarray instance of it. Like team_type, event_type is also provided by the iso_fortran_env module. An event variable must either be declared as a coarray or be a component of a coarray derived type. Then, from image 1, we post the event by executing the event post statement on the notification variable, with image 2 as the target. This increments the event count in the notification variable, which can now be queried or waited for on image 2. On the other side, image 2 issues the matching event wait statement. This statement blocks the execution on image 2 until image 1 has posted the event.

If you compile and run this program, you’ll get

caf push_notification.f90 -o push_notification     ❶
cafrun -n 2 ./push_notification                    ❷
 Image           1 working a long job
 Image           2 waiting for image 1
 Image           1 done and notifying image 2
 Image           2 notified from image 1

❶ Compiles the program using the OpenCoarrays compiler

❷ Runs the program on two images

Notice the order of printed lines in the output. The sequence of operations is set by the event post and event wait statements. Because image 1 is working on a long job (here emulated by sleeping for five seconds), image 2 will announce that it’s waiting for image 1 before it receives the notification and will print that the message was received only after event wait has executed. The following two subsections describe the general syntax of event post and event wait statements.

In listing 12.6, I used a built-in subroutine, execute_command_line, to run an external command and simulate a long job. On Linux, sleep 5 means “wait for five seconds.” You can run any external command by calling execute_command_line. The subroutine will block until the command completes. In general, this is useful for loosely integrating your Fortran programs with external (system) tools and scripts. Fortran itself, however, doesn’t provide a way to capture the output (or error message) of the external command. To do this, you’d have to redirect the output of the command into a file that you’d then read from your Fortran program (see chapter 6).

12.3.2 Posting an event

The first step to any work with events is to post them using the event post statement, which takes the general form

event post(event_var[, stat, errmsg])

where event_var is a variable of event_type, and stat and errmsg have the same meaning as they do in the form team and change team statements.

While not strictly required by the language, you’d always want to post to an event variable on another image by coindexing it (indexing a coarray); for example

type(event_type) :: notification[*]
event post(notification[this_image() + 1])     ❶

❶ Posts a notification to the next image

You can post to an event variable as many times and as frequently as you want, with or without matching event wait statements. Every time you do, an internal event count for that event variable is incremented. You can also post to an event from more than one image. You’ll see soon how this mechanism can be used to make multiple event posts and wait for them only on some occasions.

12.3.3 Waiting for an event

Images posting events is just one side of the transaction. For an image to wait for the event that it owns, it needs to execute the event wait statement. This statement has the syntax

event wait(event_var[, until_count, stat, errmsg)

where

event_var is a scalar variable of event_type and has the same meaning as in event post.
until_count is an optional integer expression that’s the number of posted events for which to wait, with a default value of 1.
stat and errmsg are optional output parameters for error handling and have the same meaning as before.

In a nutshell, event wait blocks the image that executes it until some other image posts an event to it. If until_count is provided and greater than 1, the image will wait until that many events have been posted. On successful execution of event wait, the internal event count is decremented by until_count, if provided, and by 1 otherwise. For example, this statement

event wait(notification, until_count=100)

blocks the executing image until 100 events have been posted to the notification variable from any other image. Once executed, the internal event count is decremented by exactly 100. Note that this doesn’t mean that the event count is always reduced to zero, because remote event posts can keep incrementing the event count before event wait has time to return.

Using event wait together with the until_count parameter allows you to not block on every posted event, but only on some number of events. However, it also illustrates a restriction to event wait: it’s impossible for the image that listens for events to know how many have been posted without explicitly blocking execution with event wait. This is indeed rather limiting. To poll events without blocking the current image, Fortran provides a built-in subroutine event_query, which we’ll explore in the next subsection.

12.3.4 Counting event posts

As you work with events, you’ll soon find it useful to query an event variable to find out how many times an event has been posted. The built-in subroutine event_query does exactly this

call event_query(event_var, count[, stat])

where event_var is the input variable of type event_type, and count is the output integer number of events posted. Unlike the event wait statement, calling event _query doesn’t block execution but simply returns the count of posted events. event_query is a read-only operation, so it doesn’t decrement the event count like event wait does. This makes it more suitable for implementation of nonblocking parallel algorithms, as you’ll find out in the “Exercise 2” sidebar.

In the previous section, we used the coarray time_step_count to communicate the number of time steps between the simulation and logging teams. In this exercise, use events to keep track of the simulation team’s progress and print it to screen from the logging team. For bonus points, implement two solutions, one using an event wait statement, and another using an event_query subroutine.

The solution to this exercise is given in the “Answer key” section near the end of the chapter.

12.4 Distributed computing using collectives

In chapter 7, you learned how to use coarrays and their square bracket syntax to exchange values between parallel images. This mechanism for data exchange is simple and to the point--you as the programmer explicitly instruct the computer to send and receive data between images. For common calculations across many images, such as a global sum or maximum and minimum values of distributed arrays, implementing such parallel algorithms using coarrays directly can be tedious and prone to errors. Fortran 2018 introduced collective subroutines to perform common parallel operations on distributed data.

Take, for example, a climate model that predicts the air temperature over the globe far into the future. As a climate scientist or a policy maker, you’d be interested in finding out what the global minimum, maximum, and average value of air temperature or mean sea level was over time. However, if the climate model was running in parallel (almost all of them are!), calculating the global temperature statistics would not be trivial, because every CPU would have the data only for the region that it was computing for. In the simplest implementation, you’d have to do the following:

Calculate minimum, maximum, and average values on each CPU for its region.
Gather the regional statistics to one CPU.
Calculate the global statistics on one CPU based on arrays of regional statistics.

We went through this exercise with a simple dataset back in chapter 7 when we were first introduced to coarrays. Now, collective subroutines (I’ll refer to them as collectives) can do some of the heavy lifting for you.

12.4.1 Computing the minimum and maximum of distributed arrays

Let’s try this out in the tsunami simulator. In our working version of the simulator so far, for every time step, we were reporting the time step count to the screen, while the program was writing raw data into files in the background:

program tsunami
  ...                                      ❶
  time_loop: do n = 1, num_time_steps      ❷
    if (this_image() == 1) &               ❸
      print *, 'Computing time step', &    ❸
               n, '/', num_time_steps      ❸
    ...                                    ❹
  end do time_loop                         ❺
 
end program tsunami

❶ Initialization part of the program

❷ Iterates through simulation time steps

❸ If image 1, reports time step count to the screen

❹ The simulation calculation goes here.

❺ End of the simulation time loop

At the beginning of each time step, we print the current time step count and the total number of time steps to the screen. We do this only from one image to avoid printing the same message from all images. Let’s augment this short report by adding the minimum, maximum, and average water height value to each print statement. Like in the thought experiment of a parallel climate model, the water height values here are also distributed across parallel images. The following listing shows how we’d calculate global minimum and maximum values using standard collectives co_min and co_max, respectively.

Listing 12.7 Calculating global minimum and maximum values of the water height array

...
real(ik) :: hmin, hmax                                   ❶
...
time_loop: do n = 1, num_time_steps
  ...
  hmin = minval(h % data)                                ❷
  call co_min(hmin, 1)                                   ❸
 
  hmax = maxval(h % data)                                ❹
  call co_max(hmax, 1)                                   ❺
 
  if (this_image() == 1) print '(a, i5, 2(f10.6))', &    ❻
    'step, min(h), max(h):', n, hmin, hmax               ❻
 
end do time_loop

❶ Declares temporary variables

❷ Calculates the local minimum on each image

❸ Calculates the collective minimum from hmin on each image and stores it into hmin on image 1

❹ Calculates the local maximum on each image

❺ Calculates the collective maximum from hmax on each image and stores it into hmax on image 1

❻ Prints the current time step and global minimum and maximum to the screen

To compute the global minimum of water height, we first calculate the local minimum on each image using the minval function and store it into the temporary variable hmin. Recall that h is a type(Field) instance, so we access the raw values through its component h % data. Second, we use the collective subroutine co_min to calculate the minimum value of hmin across all images. The first argument to co_min is an intent(in out) scalar, and the second argument (optional) is the number of the image on which to store the result. In this case, all images invoke co_min, and only the value of hmin on image 1 is modified in-place. If the image number were not specified (call co_min(hmin)), the value of hmin would be updated in-place on all images. This implies that invoking the collective subroutine will inevitably overwrite the value of the input on at least one image.

We repeat the same procedure to compute the global maximum using co_max. Finally, we report the current time step and minimum and maximum values to the screen using a modified print statement. Here’s the sample output:

step, min(h), max(h):    1  0.000000  1.000000
step, min(h), max(h):    2  0.000000  0.996691
step, min(h), max(h):    3  0.000000  0.990097
...
step, min(h), max(h):  998 -0.072596  0.186842
step, min(h), max(h):  999 -0.072279  0.188818
step, min(h), max(h): 1000 -0.071815  0.190565

This was an introduction to the co_min and co_max subroutines by example. In the next section, I’ll describe the rest of the collectives and provide their general syntax.

Note Collective subroutines are built into the language and are available out of the box, just like the regular functions min, max, and sum.

12.4.2 Collective subroutines syntax

Fortran 2018 defines a total of five collective subroutines:

co_broadcast--Sends the value of a variable from the current image to all others
co_max--Computes the maximum value of a variable over all images
co_min--Computes the minimum value of a variable over all images
co_sum--Computes the sum of all values of a variable across all images
co_reduce--Applies a reduction function across all images

These cover most collective operations that you’ll likely encounter in your work. However, the language won’t stop you from implementing your own custom collectives using coarrays and synchronization, should you ever need them. The rest of this section describes co_sum and co_broadcast in more detail. To learn more about co_reduce, the most complex collective subroutine, see section 12.7 for reference.

Note If you’re familiar with parallel programming using MPI, Fortran 2018 collective subroutines will look familiar, as they’re analogs to their MPI counterparts.

Figure 12.4 illustrates how co_sum works when invoked on four images.

Figure 12.4 A collective sum invoked over four parallel images, with arrows indicating possible data flow directions

In this example, we invoke co_sum(a) on each image, which triggers a summation of values of a across all images. The exact data exchange pattern may vary depending on compilers and underlying libraries, but the point is that you can use this built-in subroutine and not worry about explicitly copying data via coarrays and synchronizing images to avoid race conditions. By default, the result of the collective sum is made available on all images, and the value of a is updated on each image to the global sum value. However, if you need this value on only one image, you can specify it as an argument; for example, call co_sum(a, 3) would compute a sum over all images but update the value of a only on image 3.

The full syntax for invoking co_sum is

call co_sum(a[, result_image, stat, errmsg])

where

a is a variable that has the same type across all images. It doesn’t need to be declared as a coarray. This is an intent(in out) argument, so its value may be modified in-place.
result_image is an optional integer scalar indicating on which image to store the result. If omitted, the result is stored on all images.
stat and errmsg, both optional, are scalar integer and character variables, respectively. They have the same meaning as in allocate and deallocate statements, and allow for explicit error handling.

As you might guess, co_sum, co_min, and co_max are implemented for numeric types only (integer, real, and complex).

In almost all applications of computational fluid dynamics, it’s an important property of the simulation code to conserve fundamental physical properties, such as mass and energy. In this exercise, do the following:

Use the collective subroutine co_sum to calculate the global average of the water height.
Print the mean water height value to the screen, like we did for the minimum and maximum value.
Confirm that the tsunami simulator conserves mass by making sure that the average water height (and thus the total water mass) stays constant throughout the simulation.

The solution to this exercise is given in the “Answer key” section near the end of the chapter.

12.4.3 Broadcasting values to other images

While all images must execute the call to co_broadcast, the specified image acts as the sender, and all others act as receivers. Figure 12.5 illustrates an example of this functioning.

Figure 12.5 A collective broadcast from image 1 to the other three images, with arrows indicating the possible data flow direction

The inner workings of this procedure, including copying of data and synchronization of images, are implemented by the compiler and underlying libraries, so you and I don’t have to worry about them.

The full syntax for invoking co_broadcast is similar to co_min, co_max, and co_sum, except that the broadcast variable isn’t limited to numeric data types. A subtle but important point about collective subroutines is that the variables they operate on don’t have to be declared as coarrays. This allows you to write some parallel algorithms without declaring a single coarray. For an example, take a look at the source code of a popular Fortran framework for neural networks and deep learning at https://github.com/modern-fortran/neural-fortran. It implements parallel network training with co_broadcast and co_sum, without explicitly declaring any coarrays.

Congratulations, you made it to the end! Having now covered teams, events, and collectives, it’s a wrap. Work through the exercises, make a few parallel toy apps of your own, and you’re off to the races. You should have enough Fortran experience under your belt to start new Fortran programs and libraries, as well as to contribute to other open source projects out there. If you’d like to return to our main example, appendix C provides a recap and complete code of the tsunami simulator. It also offers ideas on where to go from here, as well as tips for learning more about Fortran. The amazing world of parallel Fortran programming is waiting for you.

12.5 Answer key

This section contains solutions to exercises in this chapter. Skip ahead if you haven’t worked through the exercises yet.

12.5.1 Exercise 1: Hunters and gatherers

Solving this exercise will require creating three new teams at the beginning of the program: hunters, gatherers, and elders. Furthermore, we’ll need to create a yet-to-be-determined number of subteams on each of the hunter and gatherer teams. Let’s tackle the first step first, as shown in the following listing.

Listing 12.8 Forming teams for hunters, gatherers, and elders

program hunters_gatherers
 
  use iso_fortran_env, only: team_type
  implicit none
 
  type(team_type) :: new_team
  integer :: team_num
  integer, parameter :: elders_team_num = 1             ❶
  integer, parameter :: hunters_team_num = 2            ❶
  integer, parameter :: gatherers_team_num = 3          ❶
 
  real :: image_fraction                                ❷
  image_fraction = this_image() / real(num_images())    ❷
 
  team_num = elders_team_num                            ❸
  if (image_fraction > 1 / 6.) &                        ❸
    team_num = hunters_team_num                         ❸
  if (image_fraction > 1 / 2.) &                        ❸
    team_num = gatherers_team_num                       ❸
 
  form team(team_num, new_team)                         ❹
 
end program hunters_gatherers

❶ Sets the team numbers as compile-time parameters

❷ Calculates the fraction of the image number relative to the total number of images

❸ Based on the image number fraction, sets the team number for each image

❹ Forms three new teams

In this part, we’re not doing anything new relative to what we learned in section 12.2, except that we’re creating three new teams instead of two. The image_fraction variable here is used as a convenience to easily assign 1/6, 1/3, and 1/2 to the elders, hunters, and gatherers, respectively.

Now, let’s change the team to new_team and print a message from one image on each team, as shown in the following listing.

Listing 12.9 Changing to a new team and reporting on the designated activity

...
  change team(new_team)                                                  ❶
    if (team_number() == elders_team_num) then                           ❷
      if (this_image() == 1) &
        print *, num_images(), 'elders stayed in the village to rest'
    else if (team_number() == hunters_team_num) then                     ❸
      if (this_image() == 1) &
        print *, num_images(), 'hunters went hunting'
    else if (team_number() == gatherers_team_num) then                   ❹
      if (this_image() == 1) &
        print *, num_images(), 'gatherers went foraging'
    end if
  end team                                                               ❺
 
end program hunters_gatherers

❶ Changes context to new_team

❷ Branch that will be executed by the elders

❸ Branch that will be executed by the hunters

❹ Branch that will be executed by the gatherers

❺ Returns context to the original team

Like we learned in section 12.2, we change the team for all images to new_team. Depending on the image number, this will be the elder, hunter, or gatherer team. Inside the change team construct, we check which team we’re on by comparing the value of team_number to our compile-time constants for team number. At this point, we only report the activity for each team.

Next, we’ll create subteams from each of the hunter and gatherer teams. Specifically for hunters, we’ll have the following snippet inside the hunters if branch:

form team ((this_image() - 1) / 3 + 1, hunters)     ❶
change team(hunters)                                ❷
print *, 'Hunter', this_image(), 'in team', &       ❸
  team_number(), 'hunting for game'                 ❸
end team                                            ❹

❶ Places hunters in subteams of 3

❷ Changes context to the new subteam

❸ Each image reports from its subteam.

❹ Returns back to the hunters team

The code to create and change to subteams for gatherers is similar to that for hunters:

form team ((this_image() - 1) / 2 + 1, gatherers)
change team(gatherers)
  print *, 'Gatherer', this_image(), 'in team', &
    team_number(), 'gathering fruits and veggies'
end team

Place this code inside the if branch for the gatherers team, and there you have it. If you now compile this program and run it on, say, 12 images, you’ll get output similar to this:

caf hunters_gatherers.f90 -o hunters_gatherers                ❶
cafrun -n 12 --oversubscribe ./hunters_gatherers              ❷
           2 elders stayed in the village to rest             ❸
           4 hunters went hunting                             ❸
           6 gatherers went foraging                          ❸
Hunter           1 in team           2 hunting for game       ❹
Hunter           2 in team           1 hunting for game
Hunter           1 in team           1 hunting for game
Gatherer           1 in team           1 gathering fruits and veggies
Hunter           3 in team           1 hunting for game
Gatherer           2 in team           3 gathering fruits and veggies
Gatherer           1 in team           3 gathering fruits and veggies
Gatherer           2 in team           1 gathering fruits and veggies
Gatherer           2 in team           2 gathering fruits and veggies
Gatherer           1 in team           2 gathering fruits and veggies

❶ Compiles using the OpenCoarrays compiler, caf

❷ Runs on 12 parallel images

❸ Group activity report from each team

❹ Individual activity reports from each villager on their subteams

In this example, I chose only 12 images for brevity, but this example will work with any number of images (well, up to the limit of your computer’s RAM, as each image runs its own copy of the program). Notice that the individual hunter and gatherer activity reports aren’t in order, and they shouldn’t be--all images execute completely asynchronously, except at form team and end team statements, where they synchronize (and only with images in their own team). For example, in the outer change team construct, the elders, hunters, and gatherers teams run in parallel to one another, and this is the beauty of parallel programming in Fortran.

You can run this program on many images on a single-core computer, and it will run like a traditional concurrent program, which in other languages is accomplished with, say, threading or async/await. You can also run this program (unchanged!) on many distributed-memory servers in parallel, and even on computers around the world.

12.5.2 Exercise 2: Tsunami time step logging using events

Let’s start with the simulation team that’s stepping forward through the computation. Recall that in listing 12.5, we used a coarray to copy the time step count from one team to another:

if (this_image() == 1) time_step_count[1, team_number=2] = n

In this snippet, we sent the value of the local time step n to the time_step_count coarray on image 1 of team 2. We did that only from one image, as all images on the simulation team have the same value for the time step count. Now, if we’re implementing this using events, this first part is easy. We’ll just declare an event variable and use it in the event post statement from team 1 to post an event from the simulation team to the logging team:

type(event_type) :: time_step_event[*]              ❶
...
if (this_image() == 1) &
  event post(time_step_event[1, team_number=2])     ❷

❶ Declares the event variable

❷ Posts the event from image 1 on the current team to image 1 on team 2

That’s it as far as posting the event from the simulation team goes. Let’s see how we can receive this information from the logging team.

Implementation using event wait

On the logging team, we’ll run in an infinite loop and have an event wait statement to block the execution. On each event intercepted, we’ll increment the counter, print the time step count to the screen, and exit the loop only if we’ve reached the end of the simulation:

...
else if (team_num == 2) then
  n = 0
  do                                           ❶
    event wait(time_step_event)                ❷
    n = n + 1                                  ❸
    print *, 'tsunami logger: step ', n, &     ❹
             'of', num_time_steps, 'done'      ❹
    if (n == num_time_steps) exit              ❺
  end do
end if
...

❶ Loops indefinitely

❷ Blocks until the event is posted

❸ Increments the counter

❹ Prints the time step count to the screen

❺ Exits if we’ve reached the end of the simulation

The advantage to the approach using event wait is that we’re guaranteed to catch every event that’s posted. The downside is that we need to do the counting outselves (n = n + 1), and that event wait is blocking the execution. This is fine if counting time steps is the only thing the logging team needs to do. The event wait approach thus makes the logging team tightly coupled to the simulation team. Now let’s take a look at the alternative solution using event_query.

Implementation using event_query

Here’s the solution to the exercise using event_query. Rather than blocking execution until each event is posted, we’re simply going to query the event count and print it to the screen if its value changed from the previous iteration:

...
else if (team_num == 2) then
  n = 0
  do                                                      ❶
    call event_query(time_step_event, time_step_count)    ❷
    if (time_step_count > n) then                         ❸
      n = time_step_count
      print *, 'tsunami logger: step ', n, &              ❹
               'of', num_time_steps, 'done'               ❹
    end if
    if (n == num_time_steps) exit                         ❺
  end do
end if
...

❶ Loops indefinitely

❷ Blocks until the event is posted

❸ Increments the counter

❹ Prints the time step count to the screen

❺ Exits if we’ve reached the end of the simulation

The advantage to this approach is that the counting is handled automatically inside the time_step_event variable. This approach is also not blocking, unlike the event wait approach. If we needed to, we could carry out some other tasks on the logging team, and in each iteration, the event_query subroutine would return whatever the current value of the time_step_count was. This approach is thus asynchronous, and some time steps may be skipped if the simulation iterations are faster than the logging.

12.5.3 Exercise 3: Calculating the global mean of water height

We’ll begin with our existing code in listing 12.7 that computes the global minimum and maximum of water height:

hmin = minval(h % data)
call co_min(hmin, 1)
 
hmax = maxval(h % data)
call co_max(hmax, 1)
if (this_image() == 1) print '(a, i5, 2(f10.6))', &
  'step, min(h), max(h):', n, hmin, hmax

To calculate the global average, we’ll follow the same procedure. However, considering that we don’t have a collective average function available out of the box, we’ll get creative with the collective sum function co_sum. First, to calculate the local average, we’ll take the sum of the local array and divide it by the total number of elements. Your first instinct may be to do something like this:

hmean = sum(h % data) / size(h % data)

Although this is the correct approach, recall that the data component of the Field type is allocated with one extra row and column on each side of the array, to facilitate halo exchange with neighboring images. From the Field type constructor function in mod_field.f 90

allocate(self % data(self % lb(1)-1:self % ub(1)+1,&    ❶
                     self % lb(2)-1:self % ub(2)+1))    ❶

❶ Allocates the data array with an extra index on each end

Thus, if we were to compute the sum of h % data as a whole, we’d also be including values from the edges of neighbor images, which isn’t what we’re looking for. Instead, we’ll slice the array to go exactly from the lower bound (lb) to the upper bound (ub) in each axis:

hmean = sum(h % data(h % lb(1):h % ub(1),h % lb(2):h % ub(2))) &
     / size(h % data(h % lb(1):h % ub(1),h % lb(2):h % ub(2)))

At this point, hmean is the local average value of water height on each parallel image. Of course, don’t forget to declare hmean in the declaration section of the program. Like with the collective minimum and maximum, we now apply co_sum to hmean to store the sum on image 1, and divide the result by the total number of images to arrive at the average value:

call co_sum(hmean, 1)            ❶
hmean = hmean / num_images()     ❷

❶ Computes the collective sum of hmean and stores the result on image 1

❷ Divides hmean by the total number of images to get the average value

Finally, let’s add hmean to the print statement and modify the format string accordingly:

if (this_image() == 1) print '(a, i5, 3(f10.6))', &
  'step, min(h), max(h), mean(h):', n, hmin, hmax, hmean

If you now recompile and rerun the tsunami simulator, you’ll get output like this:

step, min(h), max(h), mean(h):    1  0.000000  1.000000  0.003888
step, min(h), max(h), mean(h):    2  0.000000  0.996691  0.003888
step, min(h), max(h), mean(h):    3  0.000000  0.990097  0.003888
...
step, min(h), max(h), mean(h):  998 -0.072596  0.186842  0.003888
step, min(h), max(h), mean(h):  999 -0.072279  0.188818  0.003888
step, min(h), max(h), mean(h): 1000 -0.071815  0.190565  0.003888

The rightmost column in the output is our newly added water height average. Its values are constant throughout the simulation, which serves as evidence that our simulator conserves water volume.

12.6 New Fortran elements, at a glance

Teams, a mechanism to group images by common task:
- team_type--A new type for working with teams, available from the iso_ fortran_env module
- form team--A statement for creating new teams
- change team/end team--A construct to switch images to a new team
- team_number--A built-in function to get the current team number
- get_team--A built-in function to get the team variable, current or otherwise
- sync team--A statement to synchronize images across a common, typically parent team
Events, a mechanism to organize the flow of your parallel programs around discrete events:
- event_type--A new type for working with events, available from the iso_ fortran_env module
- event post--A statement to post an event to a remote image
- event wait--A statement to block execution until an event is posted from another image
- event_query--A subroutine to asynchronously count the number of posted events
Collective subroutines co_broadcast, co_max, co_min, co_reduce, and co_sum, which implement some common parallel operations
recursive--A procedure attribute that allows a procedure to invoke itself
execute_command_line--A built-in subroutine to run a command from the host operating system

12.7 Further reading

“The new features of Fortran 2018,” by John Reid (PDF download): http:// mng.bz/EdaX
“A parallel Fortran framework for neural networks and deep learning,” by Milan Curcic: https://arxiv.org/abs/1902.06714

Summary

Fortran 2018 introduces new concepts for advanced parallel programming: teams, events, and collectives.
Teams and events are mechanisms for distribution of work and synchronization, whereas collective subroutines are used for parallel reduction operations, such as sum, minimum, and maximum.
Teams are used to form distinct groups of images and assign them different tasks.
At the beginning of the program, all parallel images start in the initial team, and you can create as many teams as you want.
When you switch images to new teams, all teams run independently from one another until explicitly synchronized.
Events allow you to express the flow of your parallel program in a more elegant, and, ahem, event-driven style: post events from one or more images, wait for events from others, or just count them asynchronously.
Collective subroutines allow you to perform some common parallel patterns without directly invoking coarrays.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 12 Advanced parallelism with teams, events, and collectives

Create new playlist

Sign In

Sign Up

12 Advanced parallelism with teams, events, and collectives

12.1 From coarrays to teams, events, and collectives

12.2 Grouping images into teams with common tasks

12.2.1 Teams in the tsunami simulator

12.2.2 Forming new teams

12.2.3 Changing execution between teams

12.2.4 Synchronizing teams and exchanging data

Synchronizing images within a team

Synchronizing whole teams

Exchanging data between teams

12.3 Posting and waiting for events

12.3.1 A push notification example

12.3.2 Posting an event

12.3.3 Waiting for an event

12.3.4 Counting event posts

12.4 Distributed computing using collectives

12.4.1 Computing the minimum and maximum of distributed arrays

12.4.2 Collective subroutines syntax

12.4.3 Broadcasting values to other images

12.5 Answer key

12.5.1 Exercise 1: Hunters and gatherers

12.5.2 Exercise 2: Tsunami time step logging using events

Implementation using event wait

Implementation using event_query

12.5.3 Exercise 3: Calculating the global mean of water height

12.6 New Fortran elements, at a glance

12.7 Further reading

Summary

Table of Contents for
12 Advanced parallelism with teams, events, and collectives