From profiling, we can find where our codes consume more running time and identify the bottlenecks in our codes. Since many big codes have multiple layers in practice, it is not straightforward to find functions that, in turn, call other time-consuming functions. And, we might encounter this situation several layers down in the codes. Through the profiling process, we can efficiently determine which codes are responsible for such calls. This is an essential step for optimization planning. In this chapter, we examine the MATLAB built-in profiler to find the bottlenecks in m-files; C/C++ profiling methods for c-mex codes; CUDA code profiling methods using Visual Studio and NVIDIA Visual Profiler; and about environment setting for the c-mex debugger.
MATLAB profiling; profile viewer; c-mex debugging; Microsoft Visual Studio
From profiling, we can find where our codes consume more running time and identify the bottlenecks in our codes. Since many big codes have multiple layers in practice, it is not straightforward to find functions that in turn, call other time-consuming functions. And, we might encounter this situation several layers down in the codes. Through the profiling process, we can efficiently determine which of our codes are responsible for such calls. This is an essential step for optimization planning. In this chapter, we examine the following:
Fortunately, MATLAB provides a decent easy-to-use profiler. We are going to use the 2D convolution examples in the previous chapter again for profiling demonstrations.
You can invoke the MATLAB profiler in two ways. First, select Profiler in MATLAB Desktop as shown in Figure 3.1. Second, simply type profile viewer in the command window.
>> profile viewer
Then, we obtain a profiler window as in Figure 3.2.
To use the 2D convolution examples, change the current folder in the main MATLAB window before using the profiler window (Figure 3.3). Then, type the command you want to run at Run This Code:. We use convQuarterImage.m as an example here (Figure 3.4).
You can get the profile results as in Figure 3.5. In each column index, you can click on Function Name, Calls, Total Time, and Self Time to sort the results according to the index.
If you click convQuarterImage within the Function Name column, you get more specific information on every line within the convQuarterImage file (Figure 3.6).
When you scroll down this window, we can see the color highlighted code depending on each category (time, numcalls, coverage, etc.). According to this profiling result (Figure 3.7), we see that imagesc() and imread() take most of running time of convQuarterImage. Since imread() is a function to read an input image and the imagesc() is a function to scale or display the input image, we can focus on the pure computation part, which consumes more running time:
H=conv2(single(quarters), single(kernel));
In the next section, we see the profiling result when we replace this line with the c-mex version of it and the C/C++ profiling method in c-mex.
Nowadays, multicore CPU machines are common, and the program running speed is directly related to the number of available cores and their usability. If we want to know the number of function calls and their consuming times in just ballpark figures without more precise information on the CPU’s usage, the previous profiler setting would be enough. In some cases, however, we need more accurate analysis for program usage. To do that, we should manually set the number of cores to use outside of MATLAB and profile the codes we want to accurately measure. In the Windows system, it is easy to manipulate the number of CPU cores to use, as shown in Figure 3.8.
You get the Start Task Manager menu when you click the right button of mouse on the task bar. After clicking on the Start Task Manager, you get the Windows Task Manager window shown in Figure 3.9.
For the MATLAB process, select the Set Affinity… through clicking the right button of mouse. Then, you can access the selection window for each processor (Figure 3.10). You can select a specific processor to use for profiling your codes. After selecting one processor and turning off others for MATLAB.exe, you can profile your codes more accurately (Figure 3.11).
NVIDIA Nsight, Visual Studio Edition, is a free development environment for CUDA installed within Microsoft Visual Studio. NVIDIA Nsight, Visual Studio Edition, provides strong debugging and profiling functions, which are very efficient for CUDA code development. To download and install the NVIDIA Nsight, please refer to Appendix 2.
Let us revisit our convolution example using CUDA. In Chapter 2, we created a convolution function using CUDA functions and ran in the MATLAB command window as
>> quarters=single(imread('eight.tif'));
>> mask=single([121;000;−1−2−1]);
>> H3=conv2MexCuda(quarters, mask);
>> imagesc(H3);
>> colormap(gray);
Open Visual Studio as described in the previous section (Figure 3.12).
Go to Nsight in the menu and select Start Performance Analysis…. It may ask you to connect unsecurely (Figure 3.13). Selecting Connect unsecurely brings your Visual Studio to the screen shown in Figure 3.14.
In Application:, click on the folder browser button to select MATLAB executable. MATLAB executable can be found where your MATLAB is installed. You have to specifically select the one that is correct for your system architecture. For example, in Window 7, the 64-bit version of MATLAB can be found at the location shown in Figure 3.15.
Select MATLAB.exe and click on Open to close the dialog. Now, scroll down a little bit to Activity Type. Select the Profile CUDA Application button (Figure 3.16). After you select this option, the Launch button in Application Control is enabled.
Click on Launch. After you click, you see that MATLAB starts running along Visual Studio (Figure 3.17).
In the MATLAB command window, run the CUDA-based convolution as shown in Figure 3.18, and then go back to Visual Studio and click on Stop in Capture Control (Figure 3.19). After stop capturing, you see CUDA Overview (Figure 3.20). Select the link, Lauches in the CUDA Overview title bar. It now reveals the CUDA function and all the kernel details and time profiles (Figure 3.21). If you select conv2MexCuda [CUDA Kernel], you see what grid and block sizes we specified and how much time it took to complete the task (Figure 3.22).
You can repeat this profiling by going back to the activity tab and click on Start in Capture Control (Figure 3.23). Once you are done, clos thee MATLAB window or click on Kill in Application Control. That closes down the whole session for profiling in Visual Studio.
NVIDIA Visual Profiler provides a rich graphic user environment to give more insight into how CUDA works under the hood. In addition to giving us time profiles for each CUDA function call, it also tells us how the kernel was called, memory usage, and the like. It helps locate where possible bottlenecks are and explains how kernels were invoked in great detail.
In this section, we show how this wonderful tool can be used with MATLAB and CUDA. NVIDIA Visual Profiler can be found at where your CUDA is installed (Figure 3.24). For Windows, it can usually be found at C:Program FilesNVIDIA GPU Computing ToolkitCUDAv5.0libnvvp. For Mac OS X, see Figure 3.25. For Linux Distributions, go to /usr/local/cuda/libnvvp (Figure 3.26). Start nvvp; you will get an empty window at the beginning.
First, open NVIDIA Visual Profiler. Then, create a new session from the main menu, File -> New Session (Figure 3.27).
Click on the Browse… button and select your MATLAB executable (Figure 3.28) as we did previously. The actual MATLAB executable is found by going to the MATLAB installed bin directory. The actual binary depends on your system architecture:
For Windows 64, C:Program FilesMATLABR2012ainx64MATLAB.exe.
For Windows 32, C:Program FilesMATLABR2012ainwin32MATLAB.exe.
For Mac OS X, /Applications/MATLAB_R2012a.app/bin/maci64/MATLAB.
After selecting MATLAB executable for your architecture, click on Open in the file selection dialog box. This brings you back to the Create New Session dialog box (Figure 3.29). Click on Next in the Create New Session dialog box. Then, move on to the next step, where you can select executable properties (Figure 3.30). For now, leave all the default values as they are and click on Finish to complete creating a new session. As soon as this is done, NVIDIA Visual Profiler launches the MATLAB executable (Figure 3.31). Then, it waits until MATLAB is closed.
In the MATLAB command Window, run the CUDA-based convolution as
>> quaters=(single)imread('eight.tif'),
>> mask=single([121;000;−1−2−1]);
>> H3=conv2MexCuda(quarters, mask);
After you run these, close the MATLAB window, and the profiler will start generating profile data. However, if you encounter a warning message as in Figure 3.32, then we can slightly modify the code by adding cudaDeviceReset() at the end of the c-mex function just to ensure that all profile data is flushed, as stated in the message box:
#include "mex.h"
#include "conv2Mex.h"
#include <cuda_runtime.h>
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, mxArray *prhs[])
{
float* out = (float*)mxGetData(plhs[0]);
conv2Mex(image, out, numRows, numCols, kernel);
cudaDeviceReset();
}
and recompile c-mex with an additional include option:
>> mex conv2MexCuda.cpp conv2Mex.obj -lcudart -L"C:Program FilesNVIDIA GPU Computing ToolkitCUDAv5.0libx64" -I"C:Program FilesNVIDIA GPU Computing ToolkitCUDAv5.0include"
Rerun the convolution and close the MATLAB window. Now NVIDIA Visual Profiler presents all the information as shown in Figure 3.33.
You will obtain a pretty good idea about how much time each CUDA function took, the utilization of GPUs, and so on. If you click on the Details tab at the bottom, it even shows you thread sizes in the grid and block (Figure 3.34).
For debugging C/C++ codes within a c-mex file, we use a debugger other than MATLAB, because MATLAB provides only a m-file editor and m-file-related tools. Still, it is very easy to use other debugger for c-mex files associated with a MATLAB m-file. We used the conv2d3×3.cpp file in previous chapter. This C++ file is called by the convQuarterImageCmex.m file as follows:
For debugging the conv2d3×3.cppc-mex file associated with the convQuarterImageCmex.m m-file, we compile the conv2d3×3.cpp file in the MATLAB command window with the –g option:
>> mex −g conv2d3×3.cpp
On success, this creates a new file, conv2d3×3.mexw64 (or conv2d3×3.mexw32), in the same directory. Then, start your Visual Studio while maintaining your MATLAB session and select Attach to Process… on the Tools menu (Figure 3.35).
In the Attach to Process box, you see available processes working on your PC (Figure 3.36). If you turn off MATLAB, you cannot find the MATLAB.exe in the available process window. Select MATLAB.exe and click on Attach. Then, Visual Studio shows an empty window with Solution1 (Running) at its top, as in Figure 3.37. Open the source conv2d3×3.cpp C-Mex file through File … under Open on the File menu in the Visual Studio (Figure 3.38). Next, set a breakpoint in a line, wherever you want, by clicking the right mouse button (Figure 3.39). Then, you can see the inactivated breakpoint and a warning message. But, you can ignore it (Figure 3.40). Once you correctly set the breakpoint, you can use all the functions on the Debug menu with no limitations (Figure 3.41).
Now, let us go back to MATLAB. Run the convQuarterImageCmex.m file that calls the conv2d3×3.cpp c-mex file in the MATLAB command window (Figure 3.42). Then, the debugging mode in Visual Studio is automatically activated, as in Figure 3.43, and the code running stops at the breakpoint you set.
From now on, you can freely use any debug menu in Visual Studio, such as Step into (F11) and Step over (F10), to track variables change. The boxed Autos in Figure 3.44 shows the example where we see the variable values through navigating by Step into (F11) and Step over (F10).
18.221.47.203