Accelerating R's compuation

Now, we will try NVBLAS for the R language, with the help of the following steps:

First, let's write a sgemm.R file which carries out a dot operation:

set.seed(2019)
for(i in seq(1:5)) {
    N = 512*(2^i)
    A = matrix(rnorm(N^2, mean=0, sd=1), nrow=N) 
    B = matrix(rnorm(N^2, mean=0, sd=1), nrow=N) 
    elapsedTime = system.time({C = A %*% B})[3]
    gFlops = 2*N*N*N/(elapsedTime * 1e+9);
    print(sprintf("Elapsed Time [%d]: %3.3f ms, %.3f GFlops", N, elapsedTime, gFlops))
}

Execute the R script using the following command and compare the performance:

$ LD_PRELOAD=libnvblas.so Rscript sgemm.R

The sample code operates several times, while increasing the data size. The following table shows the outputs of the previous commands:

CPU	GPU V100
`Elapsed Time [1024]: 0.029 ms, 74.051 GFlops` `Elapsed Time [2048]: 0.110 ms, 156.181 GFlops` `Elapsed Time [4096]: 0.471 ms, 291.802 GFlops` `Elapsed Time [8192]: 2.733 ms, 402.309 GFlops` `Elapsed Time [16384]: 18.291 ms, 480.897 GFlops`	`Elapsed Time [1024]: 0.034 ms, 63.161 GFlops` `Elapsed Time [2048]: 0.063 ms, 272.696 GFlops` `Elapsed Time [4096]: 0.286 ms, 480.556 GFlops` `Elapsed Time [8192]: 1.527 ms, 720.047 GFlops` `Elapsed Time [16384]: 9.864 ms, 891.737 GFlops`

From the results, we can see the performance gap between the CPU and GPU. Also, we are able to identify that the performance gain of GPU increases when we increase the sample size.

If you are interested in R acceleration with GPU, please visit an NVIDIA development blog: https://devblogs.nvidia.com/accelerate-r-applications-cuda/

Table of Contents for Accelerating R's compuation

Create new playlist

Sign In

Sign Up

Table of Contents for
Accelerating R's compuation