Detecting memory out of bounds

Now, let's see how cuda-memcheck can detect memory exceptions and work with CUDA-GDB. To ease this, we will make some erroneous code and see how cuda-memcheck reports the result. Let's begin with some clean code. You can use the given sample code in 05_debug/08_cuda_memcheck for this. Let's test the code using cuda-memcheck and validate it:

$ nvcc -m64 -g -G -Xcompiler -rdynamic -gencode arch=compute_70,code=sm_70 -I/usr/local/cuda/samples/common/inc -o simple_sgemm ./simple_sgemm.cu
$ cuda-memcheck simple_sgemm

========= CUDA-MEMCHECK
Application finished successfully.========= ERROR SUMMARY: 0 errors

Now, let's put some erroneous code into the kernel function, as follows. You can put another error if you prefer:

For instance, you may add one to the row value.
__global__ void sgemm_kernel(const float *A, const float *B, float *C, int N, int M, int K, float alpha, float beta)
{
int col = blockIdx.x * blockDim.x + threadIdx.x;
int row = blockIdx.y * blockDim.y + threadIdx.y;
row += 1;

float sum = 0.f;
for (int i = 0; i < K; ++i)
sum += A[row * K + i] * B[i * K + col];
C[row * M + col] = alpha * sum + beta * C[row * M + col];
}

Let's compile and launch the code. The kernel will return a CUDA error and checkCudaErrors() will report an error message, as follows:

CUDA error at simple_sgemm_oob.cu:78 code=77(cudaErrorIllegalAddress) "cudaDeviceSynchronize()"

However, this information is insufficient if we wish to identify which line in the kernel code is the root cause of the problem. Using cuda-memcheck, we can identify which CUDA thread and memory space triggered the error with a stack address:

$ cuda-memcheck simple_sgemm_oob

The output is as follows:

The preceding screenshot shows a part of the standalone execution of cuda-memcheck, which shows all the detected errors from the kernel where the error occurred. In this case, cuda-memcheck reports that it detected a memory violation error at line 27. By default, cuda-memcheck stops the application's execution when an error is detected. 

In this situation, we can find the root cause easily by inspecting the related variables using cuda-gdb. To do this, we need to launch the application with cuda-gdb and enable cuda-memcheck, as follows:

$ cuda-gdb simple_sgemm_oob
(cuda-gdb) set cuda memcheck on
(cuda-gdb) run

This procedure makes cuda-gdb report illegal memory access detection from cuda-memcheck:

The preceding screenshot shows a report from cuda-gdb with cuda-memcheck. The developer can easily identify that line 27 in simple_sgemm_oob.cu triggered the reported error. From the given information, we can start to investigate which piece of memory accessed the invalid space, as follows:

(cuda-gdb) print A[row * K + i]
Error: Failed to read generic memory at address 0x7fffc7600000 on device 0 sm 41 warp 20 lane 16, error=CUDBG_ERROR_INVALID_MEMORY_SEGMENT(0x7).
(cuda-gdb) print row * K + i
$1 = 4194304

Without arduous effort, we can determine that accessing A[row * K + i] triggers an error and that the requested value exceeds the global memory's (A) allocated space. In this manner, you can narrow down the root cause without much effort. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.216.32.116