Using Aparapi

Aparapi (https://github.com/aparapi/aparapi) is a Java library that supports concurrent operations. The API supports code running on GPUs or CPUs. GPU operations are executed using OpenCL, while CPU operations use Java threads. The user can specify which computing resource to use. However, if GPU support is not available, Aparapi will revert to Java threads.

The API will convert Java byte codes to OpenCL at runtime. This makes the API largely independent from the graphics card used. The API was initially developed by AMD but has been released as open source. This is reflected in the basic package name, com.amd.aparari. Aparapi offers a higher level of abstraction than provided by OpenCL.

Aparapi code is located in a class derived from the Kernel class. Its execute method will start the operations. This will result in an internal call to a run method, which needs to be overridden. It is within the run method that concurrent code is placed. The run method is executed multiple times on different processors.

Due to OpenCL limitations, we are unable to use inheritance or method overloading. In addition, it does not like println in the run method, since the code may be running on a GPI. Aparapi only supports one-dimensional arrays. Arrays using two or more dimensions need to be flattened to a one dimension array. The support for double values is dependent on the OpenCL version and GPU configuration.

When a Java thread pool is used, it allocates one thread per CPU core. The kernel containing the Java code is cloned, one copy per thread. This avoids the need to access data across a thread. Each thread has access to information, such as a global ID, to assist in the code execution. The kernel will wait for all of the threads to complete.

Aparapi downloads can be found at https://github.com/aparapi/aparapi/releases.

Creating an Aparapi application

The basic framework for an Aparapi application is shown next. It consists of a Kernel derived class where the run method is overridden. In this example, the run method will perform scalar multiplication. This operation involves multiplying each element of a vector by some value.

The ScalarMultiplicationKernel extends the Kernel class. It possesses two instance variables used to hold the matrices for input and output. The constructor will initialize the matrices. The run method will perform the actual computations, and the displayResult method will show the results of the multiplication:

public class ScalarMultiplicationKernel extends Kernel { 
    float[] inputMatrix; 
    float outputMatrix []; 
 
    public ScalarMultiplicationKernel(float inputMatrix[]) { 
        ... 
    } 
 
    @Override 
    public void run() { 
        ... 
    } 
 
    public void displayResult() { 
        ... 
    } 
} 

The constructor is shown here:

public ScalarMultiplicationKernel(float inputMatrix[]) { 
    this.inputMatrix = inputMatrix; 
    outputMatrix = new float[this.inputMatrix.length]; 
} 

In the run method, we use a global ID to index into the matrix. This code is executed on each computation unit, for example, a GPU or thread. A unique global ID is provided to each computational unit, allowing the code to access a specific element of the matrix. In this example, each element of the input matrix is multiplied by 2 and then assigned to the corresponding element of the output matrix:

public void run() { 
    int globalID = this.getGlobalId(); 
    outputMatrix[globalID] = 2.0f * inputMatrix[globalID]; 
} 

The displayResult method simply displays the contents of the outputMatrix array:

public void displayResult() { 
    out.println("Result"); 
    for (float element : outputMatrix) { 
        out.printf("%.4f ", element); 
    } 
    out.println(); 
} 

To use this kernel, we need to declare variables for the inputMatrix and its size. The size will be used to control how many kernels to execute:

float inputMatrix[] = {3, 4, 5, 6, 7, 8, 9}; 
int size = inputMatrix.length; 

The kernel is then created using the input matrix followed by the invocation of the execute method. This method starts the process and will eventually invoke the Kernel class' run method based on the execute method's argument. This argument is referred to as the pass ID. While not used in this example, we will use it in the next section. When the process is complete, the resulting output matrix is displayed and the dispose method is called to stop the process:

ScalarMultiplicationKernel kernel =  
        new ScalarMultiplicationKernel(inputMatrix); 
kernel.execute(size); 
kernel.displayResult(); 
kernel.dispose(); 

When this application is executed we will get the following output:

6.0000 8.0000 10.0000 12.0000 14.0000 16.0000 18.000

We can specify the execution mode using the Kernel class' setExecutionMode method, as shown here:

kernel.setExecutionMode(Kernel.EXECUTION_MODE.GPU); 

However, it is best to let Aparapi determine the execution mode. The following table summarizes the execution modes available:

Execution mode

Meaning

Kernel.EXECUTION_MODE.NONE

Does not specify mode

Kernel.EXECUTION_MODE.CPU

Use CPU

Kernel.EXECUTION_MODE.GPU

Use GPU

Kernel.EXECUTION_MODE.JTP

Use Java threads

Kernel.EXECUTION_MODE.SEQ

Use single loop (for debugging purposes)

Next, we will demonstrate how we can use Aparapi to perform dot product matrix multiplication.

Using Aparapi for matrix multiplication

We will use the matrices as used in the Implementing basic matrix operations section. We start with the declaration of the MatrixMultiplicationKernel class, which contains the vector declarations, a constructor, the run method, and a displayResults method. The vectors for matrices A and B have been flattened to one-dimensional arrays by allocating the matrices in row-column order:

class MatrixMultiplicationKernel extends Kernel { 
    float[] vectorA = { 
        0.1950f, 0.0311f, 0.3588f,  
        0.2203f, 0.1716f, 0.5931f,  
        0.2105f, 0.3242f}; 
    float[] vectorB = { 
        0.0502f, 0.9823f, 0.9472f,  
        0.5732f, 0.2694f, 0.916f}; 
    float[] vectorC; 
    int n; 
    int m; 
    int p; 
 
    @Override 
    public void run() { 
        ... 
    } 
 
    public MatrixMultiplicationKernel(int n, int m, int p) { 
        ... 
    } 
 
    public void displayResults () { 
        ... 
    } 
} 

The MatrixMultiplicationKernel constructor assigns values for the matrices' dimensions and allocates memory for the result stored in vectorC, as shown here:

public MatrixMultiplicationKernel(int n, int m, int p) { 
    this.n = n; 
    this.p = p; 
    this.m = m; 
    vectorC = new float[n * p]; 
} 

The run method uses a global ID and a pass ID to perform the matrix multiplication. The pass ID is specified as the second argument of the Kernel class' execute method, as we will see shortly. This value allows us to advance the column index for vectorC. The vector indexes map to the corresponding row and column positions of the original matrices:

public void run() { 
    int i = getGlobalId(); 
    int j = this.getPassId(); 
    float value = 0; 
    for (int k = 0; k < p; k++) { 
        value += vectorA[k + i * m] * vectorB[k * p + j]; 
    } 
    vectorC[i * p + j] = value; 
} 

The displayResults method is shown as follows:

public void displayResults() { 
    out.println("Result"); 
    for (int i = 0; i < n; i++) { 
        for (int j = 0; j < p; j++) { 
            out.printf("%.4f  ", vectorC[i * p + j]); 
        } 
        out.println(); 
    } 
} 

The kernel is started in the same way as in the previous section. The execute method is passed the number of kernels that should be created and an integer indicating the number of passes to make. The number of passes is used to control the index into the vectorA and vectorB arrays:

MatrixMultiplicationKernel kernel = new MatrixMultiplicationKernel(n, m,
   p);kernel.execute(6, 3);kernel.displayResults(); 
kernel.dispose(); 

When this example is executed, you will get the following output:

Result
0.0276  0.1999  0.2132  
0.1443  0.4118  0.5417  
0.3486  0.3283  0.7058  
0.1964  0.2941  0.4964

Next, we will see how Java 8 additions can contribute to solving math-intensive problems in a parallel manner.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.19.174