CHAPTER 3

Pointers and Arrays

The concept of pointers is a core element of the ISO C programming language and has proven to be a powerful programming tool. UPC defines a number of pointer classes that follow directly from C pointers and the UPC memory model. As in C, pointers are variables that contain the address of another variable. However, under the UPC memory model such pointers can reside in either the private space or the shared space. UPC pointers can also reference the private space or the shared space. These different variants of the UPC pointer semantics serve distinct programming scenarios. Pointers often lead to compact programs and result in efficient implementations. Pointers and array indexes are closely related in C, and they remain that way in UPC. Under UPC, pointers can be declared to have blocking factors similar to those in the case of arrays. Thus, given any array layout in the shared space, a pointer can be declared to traverse the elements of such an array in the same sequence that C pointers traverse array elements. In many cases, casting is also allowed from one pointer type to another.

3.1 UPC POINTERS

The syntax of pointer declarations in UPC is similar to those of C. However, with a memory model that has shared and private spaces, there are four major UPC pointer classes: private pointers pointing to the private space, private pointers pointing to the shared space, shared pointers pointing to the shared space, and shared pointers pointing to the private space (see Figure 3.1).

Consider the following pointer declarations:

int *p1;                // private to private
shared int *p2;         // private to shared
int *shared p3;         // shared to private
shared int *shared p4;  // shared to shared

The first statement declares a private pointer p1, which points to the private space and resides in the private space; we notice that p1 is clearly a typical C pointer. p2 is declared as a private pointer that points to the shared space; therefore, each thread has one independent and private instance of p2. p4 is a shared pointer pointing to the shared space; thus, it has one instance with affinity to thread 0. p3 is a shared pointer to the private space and therefore should be avoided, as it would defeat the purpose of having shared space visible to all threads and private data space visible only to its associated thread.

images

Figure 3.1 UPC Pointer Classes

One general observation that can help us to understand these pointer class declarations quickly is to look at the relative position of the * character in the declaration syntax. For example, *p is a private pointer, whereas *shared p is a shared pointer. Note that the right part of the declaration is for the pointer type, whereas the left part is for the type of the pointed to variable. So for

shared int *p2;   // private to shared

the right part, *p2, says that p2 is a private pointer, and the left part, shared int, indicates that the variable pointed to is of type shared int. Similarly, in the declaration

shared int *shared p4;  // shared to shared

*shared p4 indicates that p4 is a shared pointer, and the left part shared int indicates that the variable pointed to is a shared int.

Figure 3.2 demonstrates where the pointers of the previous four declarations are located and to where they are pointing. This figure shows that one instance of the private pointers, p1 and p2, exists for each thread. Both of these two pointers are created in the private space of every thread instance. Only one instance of p3 and p4 is created in the shared space. Each of the p1 pointers points to its associated private space and can also point to the shared space that has affinity to that pointer. Each of the p2 pointers can point anywhere in the shared data space. The pointer p4 can also point anywhere in the shared space. As a shared pointer, p3 has only one instance created in the shared space of thread 0. However, p3 has no useful semantic behavior and therefore should be avoided.

images

Figure 3.2 UPC Pointer Scenarios

3.2 POINTER ARITHMETIC

Pointers in UPC advance (are incremented) in a way similar to that of C pointers. When an expression is added to or subtracted from a pointer-to-shared, the pointer moves by a number of elements as given by the expression, where the elements are of the same type as the object pointed to. We first demonstrate this in cases where arrays are distributed by default, one element per thread. Then we show how pointers traverse the elements of blocked arrays. Consider, for example, the sequence of statements in Example 3.1.

Example 3.1: pointer1.upc
#define N 16
shared int x [N];
shared int *dp, *dp1;

dp = &x [5];
dp1 = dp + 9;

In this code, the first three statements define x[] as a shared array of 16 elements with the default distribution layout, one per thread in round-robin fashion. The third statement,

shared int *dp, *dp1;

declares both dp and dp1 as private pointers to the shared array x[], while the last two statements initialize dp to point to the element x[5] in array x[] and dp1 to point at the ninth element after x[5] (i.e., element x [14]), shown in Figure 3.3 for the case of four threads. One thing to note here is that since dp and dp1 are private pointers to the shared array, there will be one instance of dp and one instance of dp1 at each of the threads. To show this more clearly, consider the code in Example 3.2.

Example 3.2: pointer2.upc
#define N 16
shared int x [N];

shared int *shared dp;
shared int *dp1;

dp = & x [5];
dp1 = x+MYTHREAD;

images

Figure 3.3

In this code, dp is now a shared pointer to the shared array, so only one instance of dp would be created, whereas dp1 is a private pointer to the shared data space, so each thread would have an independent instance of dp1. The code sets each instance of dp1 at each thread to point at the element with the offset of MYTHREAD from the beginning of the x[] array. Thus, although all threads have access to the pointer dp, each thread accesses only its own private dp1. If the code were to execute on four threads, the scenario shown in Figure 3.4 would result. The same scenario could also have resulted by replacing

dp1 = x+MYTHREAD;

with

dp1 = &x [MYTHREAD];

In the latter case, each dp1 instance is set to point at the address of its first element of the array in each thread.

images

Figure 3.4

images

Figure 3.5 Shared Pointer Format

To keep track of shared data, a UPC shared pointer (pointer to a shared object) is composed of three fields: thread, phase, and virtual address (Figure 3.5). The thread information clearly indicates the thread affinity of the data pointed to. On the other hand, Block Address indicates the block address, and Phase indicates the location of the data item within that block. Such typical pointer implementation makes traversing blocked arrays quite easy. This also implies that to traverse the elements of a given blocked array, a pointer with the same blocking factor is needed. Consider the following example.

Example 3.3: pointer3.upc
#define N 16
shared [3] int x[N], *dp, *dp1;
shared int *dp2;
…

dp = &x [5];
dp1 = dp + 9;
dp2 = dp + 3;

Assuming four threads, the code in Example 3.3 would result in the layout and pointer locations shown in Figure 3.6. In Example 3.3, dp and dp1 were declared with blocking factors of 3, which is also the blocking factor for the array x[]. These pointers point to the shared data space and view the shared space as blocks of three elements of that type. Pointer p was not advanced, as it was only set to &x[5]. This element, &x[5], is in the third position of a block of 3. When dp1 was incremented by 9 beyond the location of p, it immediately started with the next thread and followed the order of the array elements. This kind of blocked pointer is typically used to easily access the elements of blocked arrays, using the same blocking factor as the array. Pointer p2 shows what happens when a pointer of mismatched blocking factor is used. Since no layout qualifier was specified for p2 in its declaration, p2 follows the default distribution regardless of how the array is laid out. Thus, p2 moves to a new element in the next thread each time it is incremented.

images

Figure 3.6

3.3 POINTER CASTING AND USAGE PRACTICES

UPC allows the casting of one pointer type to another. However, shared pointer representations typically hold the thread and phase information (see Figure 3.5). Therefore, casting a shared pointer to a private pointer may result in the loss of the thread information. Although this is not generally advisable, it could be useful when it is desired to use a private pointer to point in its own local shared data space. On the other hand, casting a private pointer to a shared pointer is an error and would produce unknown results.

The following code demonstrates the syntax for casting a shared pointer to private data space:

shared int x [THREADS];
int *p;
p = (int *) &x [MYTHREAD]; /* p points to x [MYTHREAD] */

The construct &x [MYTHREAD] will return in each thread a different shared address pointing to its local element of x[]. (int*) is a cast to private, which transforms the shared pointer to a private one. Thus, the previous statement sets each of the private pointers, p, to point at the x[] element, which has affinity with its thread (i.e., MYTHREAD). There are three points here to reinforce: (1) local shared data is stored contiguously and can also be pointed at by private pointers; (2) a cast to private may result in losing the thread information in the shared pointer (thus, whereas casting from shared to private is allowed, the opposite is not true); and (3) as shared pointer arithmetic can be more involving than private pointer arithmetic, using private pointers instead of shared pointers when applicable may be more efficient in some implementations.

The following example shows how private and shared pointers may advance in shared data.

Example 3.4: pointer4.upc
#define N 4
shared int x[N][N];
shared int *dp1;
int *dp2;

dp1 = x+MYTHREAD;
dp2 = (int *)dp1+MYTHREAD;

images

Figure 3.7

In this code, dp1 is a set of private pointers to shared and dp2 is a set of private pointers. In the first statement,

dp1 = x+MYTHREAD;

each of the dp1 pointer instances will advance following the array default distribution, by MYTHREAD positions, from the first location in the array. Each of the dp1 pointers will end up pointing at the first element of any array column. Each of the dp2 pointers will be pointing MYTHREAD positions from the local dp1 pointer. The private pointers, however, will not follow the order of the array elements but will advance only in their local shared space. In the case of running this code with four threads, the dp1 pointer set will be pointing at the elements of the first row, while the dp2 pointers will be pointing at the diagonal elements as shown in Figure 3.7.

A layout qualifier is a part of a shared pointer type. It is therefore possible to traverse the elements of a shared array with one blocking factor using a pointer with a different blocking factor. Consider the following code.

Example 3.5: pointer5.upc
shared int x [N];
shared [3] int *dp=&x[5], *dp1;
dp1 = dp + 9;

This code assigns to dp1 a value that is nine positions beyond dp. The dp1 pointer will follow its own blocking and not the one of the array. When executed on four threads, the pointer locations shown in Figure 3.8 result.

As a layout specifier is considered a part of a shared pointer type, casting a shared pointer with one blocking factor to a pointer that has a different blocking factor is also possible. Given the declarations

shared [3] int *p;
shared [5] int *q;

the assignment

p = q;

images

Figure 3.8

is acceptable. However, some implementations may require explicit casting in the form

p = (shared [3] int *)q;

Pointer p will, however, obey pointer arithmetic for blocks of three, not five. It should also be noted that pointer casting always sets the phase of the shared pointer to zero.

3.4 POINTER INFORMATION AND MANIPULATION FUNCTIONS

Pointer-to-shared implementations may vary from one compiler to another. However, such pointers all represent the three fields that make up such pointers-to-shared (see Figure 3.5): the thread number, the virtual address, and the phase information. Therefore, a number of special functions are included to access and manipulate pointer-to-shared information in an implementation-independent manner. These functions are often used by compiler developers, but they are available for application developers as well, if needed. UPC provides three such functions to deliver pointer information. The first,

size_t upc_threadof (shared void *ptr);

returns the number of the thread that has affinity to the shared object pointed to by ptr. The pointer ptr is a private pointer-to-shared pointing to an object of any shared type. The phase of the pointer-to-shared can be also determined by a similar call:

size_t upc_phaseof (shared void *ptr);

and finally,

size_t upc_addrfieldof (shared void *ptr);

can be used in the same way to return the local address of the object pointed to by the pointer-to-shared. It should be noted, however, that the value returned by this function is implementation dependent.

In the following example these functions are used by thread 0 to print three elements of a pointer. The address field will be printed in 10 hexadecimal digits, while the thread number and the phase numbers will be printed in two decimal digits each.

Example 3.6: addresses.upc
//PrintPointerElements
#include <stdio.h>
#include <upc.h>
#include <upc_relaxed.h>

#define BLOCKING_FACTOR 3
#define SIZE BLOCKING_FACTOR*THREADS

shared [BLOCKING_FACTOR] int buffer [SIZE];

int main(void)
{
  int i;
  shared [BLOCKING_FACTOR] int *buffer_ptr;
  if(MYTHREAD == 0)
   {
   buffer_ptr = buffer + 5;
   for(i=0; i<SIZE; i++, buffer_ptr++)
    {
      printf(“&buffer [%d]: ”, i);
      printf(“THREAD: %02d 	 ”,
        upc_threadof(buffer_ptr));
      printf(“ADDRESS: %010Xh 	 ”,
        upc_addrfield(buffer_ptr));
      printf(“PHASE: %02d
”,
        upc_phaseof(buffer_ptr));
    }
  }
 return 0;
}

The statement

buffer_ptr = buffer + 5;

in Example 3.6 sets buffer_ptr to point to the last element in the second block buffer[5]. Therefore, the printed thread number should be 1 and the phase should be 2. The address value is implementation-dependent, however.

In addition to the functions that return the three different elements of a pointer-to-shared, the function

shared void *upc_resetphase (shared void *ptr);

returns a pointer-to-shared which is identical to ptr except that it has a phase of zero. Thus, this function can be used to reset the phase of a pointer to make it point to the first element of the current block. For example, if the statement

buffer_ptr = upc_resetphase(buffer_ptr);

is used before the printing in the previous PrintPointerFields example, the printed phase would have been zero instead of 2.

In the rest of this section we consider a UPC pointer-to-shared implementation example (Figure 3.4) along with a possible implementation of some of the pointer information and manipulation functions. This example implementation is included for illustrative purposes only. Different [CAR99] or better implementations may be sought by compiler developers. First, assume the representation shown in Figure 3.9 for the UPC shared pointer. Then, possible definitions for the upc_threadof(), upc_phaseof(), and upc_addrfieldof() follow:

typedef uint64_t upc_shared_ptr;
size_t upc_threadof(upc_shared_ptr sh_ptr)
{
  size_t t;
  t = (sh_ptr >> 38) & 0x7FF;
  return t;
}
size_t upc_phaseof(upc_shared_ptr sh_ptr)
{
  size_t t;
  t = (sh_ptr >> 49) & 0x7FFF;
  return t;
}
size_t upc_addrfield(upc_shared_ptr sh_ptr)
{

  size_t t;
  t = sh_ptr & 0x3FFFFFFFFFULL;
  return t;
}

images

Figure 3.9 Possible UPC Pointer-to-Shared Implementation

In the case of upc_threadof(), the pointer representation is shifted to the right 38 positions to bring the thread information to the least significant part of the pointer, and the remaining bits to the left of the phase information are masked out using a bitwise logical-and operation with a mask of 15 consecutive bits, 7FF in hexadecimal. Similar shifting and masking is also used to produce the phase information, in upc_phaseof(), and masking only is used to generate the address in upc_addrfieldof(), where address information of the element in this case already occupies the lower 38 bits of the pointer representation

A simpler but possibly less efficient implementation could use a structure with three fields to store the three elements of the pointer. The equivalent implementation of each of the three previous functions would simply amount to extracting the corresponding field from the structure.

3.5 MORE POINTER EXAMPLES

Consider a program for squaring a matrix A. If the matrix is shared and follows the default blocking, one can simply access the rows of the matrix using an array of pointers-to-shared. On the other hand, using the fact that local shared data is stored contiguously and can be accessed by private pointers, private pointers can be used to access the columns of the matrix. The code for this matrix-squaring example follows.

Example 3.7: matsqr1.upc
// Matrix Squaring
shared double A [THREADS][THREADS], A_Sqr [THREADS][THREADS];

int main(void)
{
  shared double *rows_ptrs [THREADS]; // Points to the rows
  double        *cols_ptr; // Points to the local shared columns
  int           i, j, k;
  double        d;

  //Initialize the pointers
  for(i=0; i < THREADS; i++)
  {
    rows_ptrs [i] = &A [i][0];

  }
  // Privatize (local shared to private)
  cols_ptr = (double *)&A [0][MYTHREAD];

  // Matrix squaring computation
  for(i=0; i<THREADS; i++)            // for each row
    upc_forall(j=0; j<THREADS; j++; j)  // each thread
    { // computes one element
     d = 0;
     for(k=0; k<THREADS; k++)
       d += *(rows_ptrs [i]+k) * *(cols_ptr+k);
       A_Sqr [i][j] = d;   // write to A_sqr [row i][col j]
    }
   return 0;
}

In this example,

rows_ptrs [i] = &A [i][0];

initializes the pointer row_ptrs [i] to point at the beginning of row i in array A[][]. Since row_ptrs is declared as an array of private pointers-to-shared, each thread will have its own private array of pointers that can be manipulated independently by each thread. When incremented, rows_ptrs[i] will advance through the elements of the ith row of array A[][]. Meanwhile, cols_ptrs is a private pointer-to-shared. Thus, in each thread

cols_ptr = (double *)&A [0][MYTHREAD];

casts the address of the ith column to private and assigns it to cols_ptrs, so that it points at the beginning of column i. The cast is used since cols_ptrs is a private pointer, whereas &A [0][i] is a shared reference. When incremented, cols_ptr of thread i will now advance through the elements of the ith column of array A[][]. Since each of the threads now accesses one of the columns, the algorithm proceeds as follows. Elements of the result matrix, A_sqr[][], are computed one row at a time. Thus, for each row of the matrix, each thread computes one result that goes into the column position corresponding to that thread number. Therefore, the outer loop

for(i=0; i<THREADS; i++)

sequences through the rows. Then, for each row,

upc_forall(j=0; j<THREADS; j++; j)

causes each of the threads to compute one of the elements in that row of A_sqr[][], specifically the element whose column index is equal to the thread number. The actual computations of an A_sqr[][] element are performed within the body of the innermost loop.

for(k=0; k<THREADS; k++)

Instead of using an array of private pointers-to-shared for addressing the rows, one could have used a private pointer-to-shared that traverse the array elements as if it were a one-dimensional array.

Example 3.8: matsqr2.upc
// Matrix Squaring (single pointer)
shared double A [THREADS][THREADS], A_Sqr [THREADS][THREADS];

int main(void)
{
  shared double *rows_ptrs;
  double        *cols_ptr;
  int           i, j, k;
  double        d;

  // Initialize the pointers
  rows_ptrs = &A [0][0];
  cols_ptr = (double *)&A [0][MYTHREAD];

  // for each row
  for(i=0; i<THREADS; i++, rows_ptrs+=THREADS)
    upc_forall(j=0; j<THREADS; j++; j)    // for each local
                                          // shared column
    {
      d = 0;
        for(k=0; k<THREADS; k++)
          d += *(rows_ptrs+k) * *(cols_ptr+k);
        A_Sqr [i][j] = d; // write to A_sqr [row i][col j]
    }
    return 0;
}

As in C, pointer and array index notations may be interchanged, and pointers to matrixes can be passed to a function that may perform the matrix squaring. Making these changes to Example 3.7, the following program results.

Example 3.9:  matsqr3.upc
// Matrix Squaring
shared double A [THREADS][THREADS], A_Sqr [THREADS][THREADS];
void mat_squaring(shared double (*dst)[THREADS],
                  shared double (*src)[THREADS])
{
  shared double *rows_ptrs [THREADS];
  double        *cols_ptr;
  int           i, j, k;
  double        d;

  // Initialize the pointers
  for(i=0; i<THREADS; i++)
    rows_ptrs[i] = &src[i][0];


  // Privatize (local shared to private)
  cols_ptr = (double *)&src[0][MYTHREAD];


  // Matrix squaring computation
  for(i=0; i<THREADS; i++)               // for each row
    upc_forall(j=0; j<THREADS; j++; j)   // each thread
    {
      d = 0;
      for(k=0; k<THREADS; k++)
        d += *(rows_ptrs[i]+k) * *(cols_ptr+k);
      dst[i][j] = d;
    }
}


int main(void)
{
  mat_squaring(A_Sqr, A);

  return 0;
}

This example touches again on an important fact: that shared objects cannot have a dynamic scope. Therefore, the shared arrays, A[][] and A_sqr[][], are declared externally. Furthermore, only private pointers-to-shared can be declared and used inside the function itself.

3.6 SUMMARY

UPC embodies many powerful pointer concepts. UPC pointers include private pointers, private pointers-to-shared, and shared pointers-to-shared. Shared pointers-to-private are also possible to declare, but using them is not advised, as their behavior is not specified. Pointers-to-shared can have blocking factors, just like shared arrays. Such a layout specifier is part of the type. Pointer-to-shared representations embody three important elements: thread number, phase number, and address. Pointer information and manipulation functions can be used to extract this information from a pointer. One type of pointer can be cast into another. For example, pointers-to-shared can be cast to private pointers. In this case, the thread number information is lost. Local shared data is stored contiguously and therefore can easily be addressed by a private pointer. All these different types of pointers available in UPC can be extremely useful to developers.

EXERCISES

3.1 Write a function to multiply two shared matrixes, A × B, with the default blocking using pointers to access the elements of the matrixes. Use one single private pointer-to-shared to point at the rows of each matrix and another to point at the columns of each matrix.

3.2 Modify the first matrix-squaring example of Section 3.5 (Example 3.7) such that the elements of one column of A_sqr[][] are computed first, then the next column, and so on. Use only private pointers-to-shared.

3.3 Write a function to copy the elements of a THREADS × THREADS matrix with a blocking factor of 5 to another with a blocking factor of 3 using blocked pointers-to-shared arithmetic.


UPC: Distributed Shared Memory Programming, by Tarek El-Ghazawi, William Carlson, Thomas Sterling, and Katherine Yelick
Copyright © 2005 John Wiley & Sons, Inc.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.69.244