3

Optimization Planning through Profiling

From profiling, we can find where our codes consume more running time and identify the bottlenecks in our codes. Since many big codes have multiple layers in practice, it is not straightforward to find functions that, in turn, call other time-consuming functions. And, we might encounter this situation several layers down in the codes. Through the profiling process, we can efficiently determine which codes are responsible for such calls. This is an essential step for optimization planning. In this chapter, we examine the MATLAB built-in profiler to find the bottlenecks in m-files; C/C++ profiling methods for c-mex codes; CUDA code profiling methods using Visual Studio and NVIDIA Visual Profiler; and about environment setting for the c-mex debugger.

Keywords

MATLAB profiling; profile viewer; c-mex debugging; Microsoft Visual Studio

3.1 Chapter Objectives

From profiling, we can find where our codes consume more running time and identify the bottlenecks in our codes. Since many big codes have multiple layers in practice, it is not straightforward to find functions that in turn, call other time-consuming functions. And, we might encounter this situation several layers down in the codes. Through the profiling process, we can efficiently determine which of our codes are responsible for such calls. This is an essential step for optimization planning. In this chapter, we examine the following:

• Employing the MATLAB built-in profiler to find the bottlenecks in m-files.

• Employing C/C++ profiling methods for c-mex codes.

• Employing CUDA code profiling methods using Visual Studio and NVIDIA Visual Profiler.

• Employing the environment setting for the c-mex debugger.

3.2 MATLAB Code Profiling to Find Bottlenecks

Fortunately, MATLAB provides a decent easy-to-use profiler. We are going to use the 2D convolution examples in the previous chapter again for profiling demonstrations.

You can invoke the MATLAB profiler in two ways. First, select Profiler in MATLAB Desktop as shown in Figure 3.1. Second, simply type profile viewer in the command window.

image

Figure 3.1 Select Profiler from the MATLAB desktop menu.

>> profile viewer

Then, we obtain a profiler window as in Figure 3.2.

image

Figure 3.2 MATLAB profiler window.

To use the 2D convolution examples, change the current folder in the main MATLAB window before using the profiler window (Figure 3.3). Then, type the command you want to run at Run This Code:. We use convQuarterImage.m as an example here (Figure 3.4).

image

Figure 3.3 Change the current folder in the main MATLAB window.

image

Figure 3.4 Run this code in the MATLAB profiler.

You can get the profile results as in Figure 3.5. In each column index, you can click on Function Name, Calls, Total Time, and Self Time to sort the results according to the index.

image

Figure 3.5 Profile Summary from the MATLAB profiler.

If you click convQuarterImage within the Function Name column, you get more specific information on every line within the convQuarterImage file (Figure 3.6).

image

Figure 3.6 More detail in profiling results.

When you scroll down this window, we can see the color highlighted code depending on each category (time, numcalls, coverage, etc.). According to this profiling result (Figure 3.7), we see that imagesc() and imread() take most of running time of convQuarterImage. Since imread() is a function to read an input image and the imagesc() is a function to scale or display the input image, we can focus on the pure computation part, which consumes more running time:

image

Figure 3.7 Execution timing information in each line of profiling results.

H=conv2(single(quarters), single(kernel));

In the next section, we see the profiling result when we replace this line with the c-mex version of it and the C/C++ profiling method in c-mex.

3.2.1 More Accurate Profiling with Multiple CPU Cores

Nowadays, multicore CPU machines are common, and the program running speed is directly related to the number of available cores and their usability. If we want to know the number of function calls and their consuming times in just ballpark figures without more precise information on the CPU’s usage, the previous profiler setting would be enough. In some cases, however, we need more accurate analysis for program usage. To do that, we should manually set the number of cores to use outside of MATLAB and profile the codes we want to accurately measure. In the Windows system, it is easy to manipulate the number of CPU cores to use, as shown in Figure 3.8.

image

Figure 3.8 To manipulate the number of CPU cores to use, open the Start Task Manager.

You get the Start Task Manager menu when you click the right button of mouse on the task bar. After clicking on the Start Task Manager, you get the Windows Task Manager window shown in Figure 3.9.

image

Figure 3.9 To manipulate the number of CPU cores to use, click on Set Affinity.

For the MATLAB process, select the Set Affinity… through clicking the right button of mouse. Then, you can access the selection window for each processor (Figure 3.10). You can select a specific processor to use for profiling your codes. After selecting one processor and turning off others for MATLAB.exe, you can profile your codes more accurately (Figure 3.11).

image

Figure 3.10 To manipulate the number of CPU cores to use, click on the Processor Affinity window.

image

Figure 3.11 To manipulate the number of CPU cores to use, select processors we want to use.

3.3 c-mex Code Profiling for CUDA

3.3.1 CUDA Profiling Using Visual Studio

NVIDIA Nsight, Visual Studio Edition, is a free development environment for CUDA installed within Microsoft Visual Studio. NVIDIA Nsight, Visual Studio Edition, provides strong debugging and profiling functions, which are very efficient for CUDA code development. To download and install the NVIDIA Nsight, please refer to Appendix 2.

Let us revisit our convolution example using CUDA. In Chapter 2, we created a convolution function using CUDA functions and ran in the MATLAB command window as

>> quarters=single(imread('eight.tif'));
>> mask=single([121;000;−1−2−1]);
>> H3=conv2MexCuda(quarters, mask);
>> imagesc(H3);
>> colormap(gray);

Open Visual Studio as described in the previous section (Figure 3.12).

image

Figure 3.12 Nsight installed in Microsoft Visual Studio.

Go to Nsight in the menu and select Start Performance Analysis…. It may ask you to connect unsecurely (Figure 3.13). Selecting Connect unsecurely brings your Visual Studio to the screen shown in Figure 3.14.

image

Figure 3.13 Unsecure connection window for connecting Nsight with MATLAB.

image

Figure 3.14 Application connection window from Visual Studio.

In Application:, click on the folder browser button to select MATLAB executable. MATLAB executable can be found where your MATLAB is installed. You have to specifically select the one that is correct for your system architecture. For example, in Window 7, the 64-bit version of MATLAB can be found at the location shown in Figure 3.15.

image

Figure 3.15 Select MATLAB as a connected application.

Select MATLAB.exe and click on Open to close the dialog. Now, scroll down a little bit to Activity Type. Select the Profile CUDA Application button (Figure 3.16). After you select this option, the Launch button in Application Control is enabled.

image

Figure 3.16 Select the “Profile CUDA Application” as an activity type.

Click on Launch. After you click, you see that MATLAB starts running along Visual Studio (Figure 3.17).

image

Figure 3.17 The application after selecting Launch.

In the MATLAB command window, run the CUDA-based convolution as shown in Figure 3.18, and then go back to Visual Studio and click on Stop in Capture Control (Figure 3.19). After stop capturing, you see CUDA Overview (Figure 3.20). Select the link, Lauches in the CUDA Overview title bar. It now reveals the CUDA function and all the kernel details and time profiles (Figure 3.21). If you select conv2MexCuda [CUDA Kernel], you see what grid and block sizes we specified and how much time it took to complete the task (Figure 3.22).

image

Figure 3.18 Running MATLAB as an application of profiling.

image

Figure 3.19 Finishing MATLAB profiling within Visual Studio.

image

Figure 3.20 CUDA Overview after finishing profiling.

image

Figure 3.21 CUDA kernel details and time profiles in the Nsight window.

image

Figure 3.22 More detailed information on CUDA kernel and time profiles in the Nsight window.

You can repeat this profiling by going back to the activity tab and click on Start in Capture Control (Figure 3.23). Once you are done, clos thee MATLAB window or click on Kill in Application Control. That closes down the whole session for profiling in Visual Studio.

image

Figure 3.23 Capture Control in Nsight.

3.3.2 CUDA Profiling Using NVIDIA Visual Profiler

NVIDIA Visual Profiler provides a rich graphic user environment to give more insight into how CUDA works under the hood. In addition to giving us time profiles for each CUDA function call, it also tells us how the kernel was called, memory usage, and the like. It helps locate where possible bottlenecks are and explains how kernels were invoked in great detail.

In this section, we show how this wonderful tool can be used with MATLAB and CUDA. NVIDIA Visual Profiler can be found at where your CUDA is installed (Figure 3.24). For Windows, it can usually be found at C:Program FilesNVIDIA GPU Computing ToolkitCUDAv5.0libnvvp. For Mac OS X, see Figure 3.25. For Linux Distributions, go to /usr/local/cuda/libnvvp (Figure 3.26). Start nvvp; you will get an empty window at the beginning.

image

Figure 3.24 NVIDIA Visual Profiler is where CUDA is installed.

image

Figure 3.25 NVIDIA Visual Profiler for Mac OS X.

image

Figure 3.26 NVIDIA Visual Profiler for Linux.

First, open NVIDIA Visual Profiler. Then, create a new session from the main menu, File -> New Session (Figure 3.27).

image

Figure 3.27 New Session in NVIDIA Visual Profiler.

Click on the Browse… button and select your MATLAB executable (Figure 3.28) as we did previously. The actual MATLAB executable is found by going to the MATLAB installed bin directory. The actual binary depends on your system architecture:

For Windows 64, C:Program FilesMATLABR2012ainx64MATLAB.exe.

For Windows 32, C:Program FilesMATLABR2012ainwin32MATLAB.exe.

For Mac OS X, /Applications/MATLAB_R2012a.app/bin/maci64/MATLAB.

For Linux, /usr/local/MATLAB/R2012a/bin/glnxa64/MATLAB.

image

Figure 3.28 Selecting MATLAB executable for NVIDIA Visual Profiler.

After selecting MATLAB executable for your architecture, click on Open in the file selection dialog box. This brings you back to the Create New Session dialog box (Figure 3.29). Click on Next in the Create New Session dialog box. Then, move on to the next step, where you can select executable properties (Figure 3.30). For now, leave all the default values as they are and click on Finish to complete creating a new session. As soon as this is done, NVIDIA Visual Profiler launches the MATLAB executable (Figure 3.31). Then, it waits until MATLAB is closed.

image

Figure 3.29 Create New Session dialog window.

image

Figure 3.30 Executable Properties in the Create New Session dialog window.

image

Figure 3.31 Launching MATLAB for profiling in NVIDIA Visual Profiler.

In the MATLAB command Window, run the CUDA-based convolution as

>> quaters=(single)imread('eight.tif'),
>> mask=single([121;000;−1−2−1]);
>> H3=conv2MexCuda(quarters, mask);

After you run these, close the MATLAB window, and the profiler will start generating profile data. However, if you encounter a warning message as in Figure 3.32, then we can slightly modify the code by adding cudaDeviceReset() at the end of the c-mex function just to ensure that all profile data is flushed, as stated in the message box:

image

Figure 3.32 Warning message on the incomplete application.

#include "mex.h"
#include "conv2Mex.h"
#include <cuda_runtime.h>
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, mxArray *prhs[])
{
   float* out = (float*)mxGetData(plhs[0]);
   conv2Mex(image, out, numRows, numCols, kernel);
   cudaDeviceReset();
}

and recompile c-mex with an additional include option:

>> mex conv2MexCuda.cpp conv2Mex.obj -lcudart -L"C:Program FilesNVIDIA GPU Computing ToolkitCUDAv5.0libx64" -I"C:Program FilesNVIDIA GPU Computing ToolkitCUDAv5.0include"

Rerun the convolution and close the MATLAB window. Now NVIDIA Visual Profiler presents all the information as shown in Figure 3.33.

image

Figure 3.33 NVIDIA Visual Profiler result window.

You will obtain a pretty good idea about how much time each CUDA function took, the utilization of GPUs, and so on. If you click on the Details tab at the bottom, it even shows you thread sizes in the grid and block (Figure 3.34).

image

Figure 3.34 Timing information from the Details tab in the NVIDIA Visual Profiler.

3.4 Environment Setting for the c-mex Debugger

For debugging C/C++ codes within a c-mex file, we use a debugger other than MATLAB, because MATLAB provides only a m-file editor and m-file-related tools. Still, it is very easy to use other debugger for c-mex files associated with a MATLAB m-file. We used the conv2d3×3.cpp file in previous chapter. This C++ file is called by the convQuarterImageCmex.m file as follows:

The conv2d3×3.cpp File

#include "mex.h"

#include "conv2d3x3.h"

void conv2d3x3(float* src, float* dst, int numRows, int numCols, float* mask)

{

   int boundCol = numCols−1;

   int boundRow = numRows−1;

   for (int c = 1; c < boundCol; c++)

   {

      for (int r = 1; r < boundRow−1; r++)

      {

         int dstIndex = c * numRows + r;

         int mskIndex = 8;

         for (int kc = -1; kc < 2; kc++)

         {

            int srcIndex = (c + kc) * numRows + r;

            for (int kr=−1; kr < 2; kr++)

               dst[dstIndex] += mask[mskIndex−−] * src[srcIndex + kr];

         }

      }

   }

}

void mexFunction(int nlhs, mxArray *plhs[], int nrhs, mxArray *prhs[])

{

   if (nrhs != 2)

        mexErrMsgTxt("Invaid number of input arguments");

   if (nlhs != 1)

        mexErrMsgTxt("Invalid number of outputs");

   if (!mxIsSingle(prhs[0]) && !mxIsSingle(prhs[1]))

        mexErrMsgTxt("input image and mask type must be single");

   float* image = (float*)mxGetData(prhs[0]);

   float* mask = (float*)mxGetData(prhs[1]);

   int numRows = mxGetM(prhs[0]);

   int numCols = mxGetN(prhs[0]);

   int numKRows = mxGetM(prhs[1]);

   int numKCols = mxGetN(prhs[1]);

   if (numKRows != 3 || numKCols != 3)

      mexErrMsgTxt("Invalid mask size. It must be 3x3");

   plhs[0] = mxCreateNumericMatrix(numRows, numCols, mxSINGLE_CLASS, mxREAL);

   float* out = (float*)mxGetData(plhs[0]);

   conv2d3x3(image, out, numRows, numCols, mask);

}

The convQuarterImageCmex.m File

quarters = imread('eight.tif'),

imagesc(quarters);

colormap(gray);

mask = [1 2 1; 0 0 0; −1 −2 −1];

single_q = single(quarters);

single_k = single(mask);

H = conv2d3x3(single_q, single_k);   % Call C-Mex file here

figure;

imagesc(H);

colormap(gray);

For debugging the conv2d3×3.cppc-mex file associated with the convQuarterImageCmex.m m-file, we compile the conv2d3×3.cpp file in the MATLAB command window with the –g option:

>> mex −g conv2d3×3.cpp

On success, this creates a new file, conv2d3×3.mexw64 (or conv2d3×3.mexw32), in the same directory. Then, start your Visual Studio while maintaining your MATLAB session and select Attach to Process… on the Tools menu (Figure 3.35).

image

Figure 3.35 Attach to Process… menu for the Microsoft Visual Studio debugger.

In the Attach to Process box, you see available processes working on your PC (Figure 3.36). If you turn off MATLAB, you cannot find the MATLAB.exe in the available process window. Select MATLAB.exe and click on Attach. Then, Visual Studio shows an empty window with Solution1 (Running) at its top, as in Figure 3.37. Open the source conv2d3×3.cpp C-Mex file through File … under Open on the File menu in the Visual Studio (Figure 3.38). Next, set a breakpoint in a line, wherever you want, by clicking the right mouse button (Figure 3.39). Then, you can see the inactivated breakpoint and a warning message. But, you can ignore it (Figure 3.40). Once you correctly set the breakpoint, you can use all the functions on the Debug menu with no limitations (Figure 3.41).

image

Figure 3.36 Attaching MATLAB to the Microsoft Visual Studio debugger.

image

Figure 3.37 An empty window with Solution1 (Running) in the Microsoft Visual Studio debugger.

image

Figure 3.38 Open source code in the Microsoft Visual Studio debugger.

image

Figure 3.39 Inserting a breakpoint in the debugger.

image

Figure 3.40 An inactivated breakpoint.

image

Figure 3.41 Various functions in the Microsoft Visual Studio debugger.

Now, let us go back to MATLAB. Run the convQuarterImageCmex.m file that calls the conv2d3×3.cpp c-mex file in the MATLAB command window (Figure 3.42). Then, the debugging mode in Visual Studio is automatically activated, as in Figure 3.43, and the code running stops at the breakpoint you set.

image

Figure 3.42 Running the MATLAB main module for debugging.

image

Figure 3.43 Automatically activated debugging mode after running the MATLAB main module.

From now on, you can freely use any debug menu in Visual Studio, such as Step into (F11) and Step over (F10), to track variables change. The boxed Autos in Figure 3.44 shows the example where we see the variable values through navigating by Step into (F11) and Step over (F10).

image

Figure 3.44 Example of debugging tools in the Microsoft Visual Studio debugger with MATLAB.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.47.203