Getting ready

Open a terminal and create a data set consisting of several files with the dsetmkr.sh script:

$ #!/bin/bash
BDIR="files_galore"
rm -rf ${BDIR}
mkdir -p ${BDIR}

touch $BDIR/file1; echo "1111111111111111111111111111111" > $BDIR/file1;
touch $BDIR/file2; echo "2222222222222222222222222222222" > $BDIR/file2;
touch $BDIR/file3; echo "3333333333333333333333333333333" > $BDIR/file3;
touch $BDIR/file4; echo "4444444444444444444444444444444" > $BDIR/file4;
touch $BDIR/file5; echo "4444444444444444444444444444444" > $BDIR/file5;
touch $BDIR/sameas5; echo "4444444444444444444444444444444" > $BDIR/sameas5;
touch $BDIR/sameas1; echo "1111111111111111111111111111111" > $BDIR/sameas1;

Then, before jumping into scripting, a core concept needs to be discussed regarding arrays and whether they are static or dynamic; knowing how an array implementation works at its core is a key principle if performance is an objective.

Arrays can be really helpful, but the performance of a Bash script is often sub-par to that of a compiled program or even choosing a language with the appropriate data structures. In Bash, arrays are linked lists and dynamic, which means that if you resize the array, there isn't a massive performance penalty.

For our purposes, we are going to create a dynamic array and once the array becomes quite large, it will be the searching of the array which becomes the performance bottleneck. This naive iterative approach usually works well up to an arbitrary amount (let's say, N), and at which the benefits of using another mechanism may outweigh the simplicity of the current approach. For those who want to know more about data structures and the performance of them, check out Big O notation and complexity theory.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.169.197