Now, let's see how cuda-memcheck can detect memory exceptions and work with CUDA-GDB. To ease this, we will make some erroneous code and see how cuda-memcheck reports the result. Let's begin with some clean code. You can use the given sample code in 05_debug/08_cuda_memcheck for this. Let's test the code using cuda-memcheck and validate it:
$ nvcc -m64 -g -G -Xcompiler -rdynamic -gencode arch=compute_70,code=sm_70 -I/usr/local/cuda/samples/common/inc -o simple_sgemm ./simple_sgemm.cu
$ cuda-memcheck simple_sgemm
========= CUDA-MEMCHECK
Application finished successfully.========= ERROR SUMMARY: 0 errors
Now, let's put some erroneous code into the kernel function, as follows. You can put another error if you prefer:
For instance, you may add one to the row value.
__global__ void sgemm_kernel(const float *A, const float *B, float *C, int N, int M, int K, float alpha, float beta)
{
int col = blockIdx.x * blockDim.x + threadIdx.x;
int row = blockIdx.y * blockDim.y + threadIdx.y;
row += 1;
float sum = 0.f;
for (int i = 0; i < K; ++i)
sum += A[row * K + i] * B[i * K + col];
C[row * M + col] = alpha * sum + beta * C[row * M + col];
}
Let's compile and launch the code. The kernel will return a CUDA error and checkCudaErrors() will report an error message, as follows:
CUDA error at simple_sgemm_oob.cu:78 code=77(cudaErrorIllegalAddress) "cudaDeviceSynchronize()"
However, this information is insufficient if we wish to identify which line in the kernel code is the root cause of the problem. Using cuda-memcheck, we can identify which CUDA thread and memory space triggered the error with a stack address:
$ cuda-memcheck simple_sgemm_oob
The output is as follows:
The preceding screenshot shows a part of the standalone execution of cuda-memcheck, which shows all the detected errors from the kernel where the error occurred. In this case, cuda-memcheck reports that it detected a memory violation error at line 27. By default, cuda-memcheck stops the application's execution when an error is detected.
In this situation, we can find the root cause easily by inspecting the related variables using cuda-gdb. To do this, we need to launch the application with cuda-gdb and enable cuda-memcheck, as follows:
$ cuda-gdb simple_sgemm_oob
(cuda-gdb) set cuda memcheck on
(cuda-gdb) run
This procedure makes cuda-gdb report illegal memory access detection from cuda-memcheck:
The preceding screenshot shows a report from cuda-gdb with cuda-memcheck. The developer can easily identify that line 27 in simple_sgemm_oob.cu triggered the reported error. From the given information, we can start to investigate which piece of memory accessed the invalid space, as follows:
(cuda-gdb) print A[row * K + i]
Error: Failed to read generic memory at address 0x7fffc7600000 on device 0 sm 41 warp 20 lane 16, error=CUDBG_ERROR_INVALID_MEMORY_SEGMENT(0x7).
(cuda-gdb) print row * K + i
$1 = 4194304
Without arduous effort, we can determine that accessing A[row * K + i] triggers an error and that the requested value exceeds the global memory's (A) allocated space. In this manner, you can narrow down the root cause without much effort.