Now, we will try NVBLAS for the R language, with the help of the following steps:
- First, let's write a sgemm.R file which carries out a dot operation:
set.seed(2019)
for(i in seq(1:5)) {
N = 512*(2^i)
A = matrix(rnorm(N^2, mean=0, sd=1), nrow=N)
B = matrix(rnorm(N^2, mean=0, sd=1), nrow=N)
elapsedTime = system.time({C = A %*% B})[3]
gFlops = 2*N*N*N/(elapsedTime * 1e+9);
print(sprintf("Elapsed Time [%d]: %3.3f ms, %.3f GFlops", N, elapsedTime, gFlops))
}
- Execute the R script using the following command and compare the performance:
$ LD_PRELOAD=libnvblas.so Rscript sgemm.R
The sample code operates several times, while increasing the data size. The following table shows the outputs of the previous commands:
CPU | GPU V100 |
---|---|
|
|
From the results, we can see the performance gap between the CPU and GPU. Also, we are able to identify that the performance gain of GPU increases when we increase the sample size.
If you are interested in R acceleration with GPU, please visit an NVIDIA development blog: https://devblogs.nvidia.com/accelerate-r-applications-cuda/