Analyzing data directly on a shared array

Using shared arrays allows us to perform parallel operations on the data from a single memory space. As long as we do not mutate the data, then these operations can run independently without conflicts. This type of problem is called embarrassingly parallel.

To illustrate the power of multi-processing, let's first benchmark a simple function that calculates the standard deviation of the returns across all securities:

using Statistics: std

# Find standard deviation of each attribute for each security
function std_by_security(valuation)
(nstates, nattr, n) = size(valuation)
result = zeros(n, nattr)
for i in 1:n
for j in 1:nattr
result[i, j] = std(valuation[:, j, i])
end
end
return result
end

The value of n represents number of securities.  The value of nattr represents number of sources of return. Let's see how much time it takes for a single process. The best timing was 5.286 seconds:

The @benchmark macro provides some statistics about the performance benchmark. Sometimes, it is useful to see the distribution and have an idea about how much GC impacts performance.

The seconds=30 parameter was specified because this function takes seconds to run. The default parameter value is 5 seconds, and that would not allow the benchmark to collect enough samples for reporting. 

We are now ready to run the program in parallel. First, we need to make sure that all child processes have the dependent packages loaded:

@everywhere using Statistics: std

Then, we can define a distributed function, as follows:

function std_by_security2(valuation)
(nstates, nattr, n) = size(valuation)
result = SharedArray{Float64}(n, nattr)
@sync @distributed for i in 1:n
for j in 1:nattr
result[i, j] = std(valuation[:, j, i])
end
end
return result
end

This function looks very similar to the previous one, with some exceptions:

  1. We have allocated a new shared array, result, to store the computed data. This array is 2-dimensional because we are reducing the third dimension into a single standard deviation value. This array is accessible by all worker processes.
  2. The @distributed macro in front of the for loop is used to automatically distribute the work, in other words, the body of the for loop, to the worker processes. 
  3. The @sync macro in front of the for loop makes the system wait until all of the work is done.

We can now benchmark the performance of this new function using the same 16 worker processes:

Compared to the performance of a single process, this is approximately 6x faster than before.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.77.71