Minimizing loop operations

There are many times when you develop a script that you place steps inside of a loop as it was convenient to do so when developed and probably did not make much difference to performance when running against smaller test data. For example, on any of the scripts in this book I would normally extract a small set of rows to work with. That would provide confidence that I am accessing the data correctly. However, I may have adjusted loop operations to have flagging operations occur. On a smaller dataset of 20 rows or so there is no big effect. However, when I start using the true dataset, which may have millions of rows, the setting of that flag continuously for every row would affect the overall performance of the operation.

However, once you are working with larger data those minor operations that are occurring every time the loop executes become expensive. There are usually operations that can be pulled out of the loop and executed once outside of the loop. For example, if we were looking for the largest number in a loop we would initialize our result outside of the loop to an unrealistic value and inside the loop, if we see the unrealistic value we initialize the result with the first result. This test occurs on every loop operation. Instead we could set the result from the first record BEFORE entering the loop.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.234.62