Matrix Multiply

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Matrix Multiply

Example 11-6 shows a SerialMatrixMultiply that makes no use of Threading Building Blocks or any other parallelization, whereas Example 11-7 shows the corresponding ParallelMatrixMultiply that uses a blocked_range2d to specify a two-dimensional iteration space. The functions operate the same as far as the rest of the program is concerned. Obviously, we expect ParallelMatrixMultiply to run faster when on a machine with more than one processor core.

Example 11-6. Matrix multiply serial code

const size_t L = 150;
const size_t M = 225;
const size_t N = 300;

void SerialMatrixMultiply( float c[M][N], float a[M][L], float b[L][N] ) {
    for( size_t i=0; i<M; ++i ) {
        for( size_t j=0; j<N; ++j ) {
            float sum = 0;
            for( size_t k=0; k<L; ++k )
                sum += a[i][k]*b[k][j];
            c[i][j] = sum;
        }
    }
}

Example 11-7. Equivalent matrix multiply with blocked_range2d

#include "tbb/parallel_for.h"
#include "tbb/blocked_range2d.h"

using namespace tbb;

const size_t L = 150;
const size_t M = 225;
const size_t N = 300;

class MatrixMultiplyBody2D {
    float (*my_a)[L];
    float (*my_b)[N];
    float (*my_c)[N];
public:
    void operator()( const blocked_range2d<size_t>& r ) const {
        float (*a)[L] = my_a; // a,b,c used in example to emphasize
        float (*b)[N] = my_b; // commonality with serial code
        float (*c)[N] = my_c;
        for( size_t i=r.rows().begin(); i!=r.rows().end(); ++i ){
            for( size_t j=r.cols().begin(); j!=r.cols().end(); ++j ) {
                float sum = 0;
                for( size_t k=0; k<L; ++k )
                    sum += a[i][k]*b[k][j];
                c[i][j] = sum;
            }
        }
    }
    MatrixMultiplyBody2D( float c[M][N], float a[M][L], float b[L][N] ) :
        my_a(a), my_b(b), my_c(c)
    {}
};

void ParallelMatrixMultiply(float c[M][N], float a[M][L], float b[L][N]){parallel_for( blocked_range2d<size_t>(0, M, 16, 0, N, 32),
                  MatrixMultiplyBody2D(c,a,b) );
}

The blocked_range2d enables the two outermost loops of the serial version to become parallel loops. The parallel_for recursively splits the blocked_range2d until the pieces are no larger than 16 x 32. It invokes MatrixMultiplyBody2D::operator() on each piece.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Matrix Multiply

Create new playlist

Sign In

Sign Up

Matrix Multiply

Table of Contents for
Matrix Multiply