Preparing data for the example

To follow the subsequent code for this pattern, we can prepare some test data. Before you run the code here, make sure that you have enough disk space for the test data. You will need approximately 22 GB of free space.

Rather than putting 100,000 files in a single directory, we can split them into 100 sub-directories. So, let's just create the directories first. A simple function is created for that purpose:

function make_data_directories()
for i in 0:99
mkdir("$i")
end
end

We can assume that every security is identified by a numerical index value between 1 and 100,000. Let's define a function that generates the path to find the file:

function locate_file(index)
id = index - 1
dir = string(id % 100)
joinpath(dir, "sec$(id).dat")
end

The function is designed to hash the file into one of the 100 sub-directories. Let's see how it works:

julia> locate_file.(vcat(1:2, 100:101))
4-element Array{String,1}:
"0/sec0.dat"
"1/sec1.dat"
"99/sec99.dat"
"0/sec100.dat"

So, the first 100 securities are located in directories called 0, 1, ..., 99. The 101st security starts wrapping and goes back to directory 0. For consistency reasons, the filename contains the security index minus 1.

Now we are ready to generate the test data. Let's define a function as follows:

function generate_test_data(nfiles)
for i in 1:nfiles
A = rand(10000, 3)
file = locate_file(i)
open(file, "w") do io
write(io, A)
end
end
end

To generate all test files, we just need to call this function by passing nfiles with a value of 100,000. By the end of this exercise, you should have test files scattered in all 100 sub-directories. Note that the generate_test_data function will take a few minutes to generate all the test data. Let's do that now:

When it is done, let's quickly take a look at our data files in a Terminal:

We're now ready to tackle the problem using the shared array pattern. Let's get started.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.220.140.5