Performance benefits of stacked data

Finally, we will examine a reason for which we would want to stack data like this. This is because it can be shown to be more efficient than using lookup through a single level index and then a column lookup, or even compared to an .iloc lookup, specifying the location of the row and column by location. The following demonstrates this:

In [53]:
   # stacked scalar access can be a lot faster than 
   # column access

   # time the different methods
   import timeit
   t = timeit.Timer("stacked1[('one', 'a')]", 
                    "from __main__ import stacked1, df")
   r1 = timeit.timeit(lambda: stacked1.loc[('one', 'a')], 
                      number=10000)
   r2 = timeit.timeit(lambda: df.loc['one']['a'], 
                      number=10000)
   r3 = timeit.timeit(lambda: df.iloc[1, 0], 
                      number=10000)

   # and the results are...  Yes, it's the fastest of the three
   r1, r2, r3

Out[53]:
   (0.5598540306091309, 1.0486528873443604, 1.2129769325256348)

This can have extreme benefits for application performance if we need to repeatedly access a large number of scalar values out of a DataFrame.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.124.21