Arithmetic on a DataFrame

Arithmetic operations using scalar values will be applied to every element of a DataFrame. To demonstrate, we will use a DataFrame object initialized with random values:

In [94]:
   # set the seed to allow replicatable results
   np.random.seed(123456)
   # create the DataFrame
   df = pd.DataFrame(np.random.randn(5, 4), 
                     columns=['A', 'B', 'C', 'D'])
   df

Out[94]:
             A         B         C         D
   0  0.469112 -0.282863 -1.509059 -1.135632
   1  1.212112 -0.173215  0.119209 -1.044236
   2 -0.861849 -2.104569 -0.494929  1.071804
   3  0.721555 -0.706771 -1.039575  0.271860
   4 -0.424972  0.567020  0.276232 -1.087401

By default, any arithmetic operation will be applied across all rows and columns of a DataFrame and will return a new DataFrame with the results (leaving the original unchanged):

In [95]:
   # multiply everything by 2
   df * 2

Out[95]:
             A         B         C         D
   0  0.938225 -0.565727 -3.018117 -2.271265
   1  2.424224 -0.346429  0.238417 -2.088472
   2 -1.723698 -4.209138 -0.989859  2.143608
   3  1.443110 -1.413542 -2.079150  0.543720
   4 -0.849945  1.134041  0.552464 -2.174801

When performing an operation between a DataFrame and a Series, pandas will align the Series index along the DataFrame columns, performing what is referred to as a row-wise broadcast.

The following example retrieves the first row of the DataFrame, and then subtracts this from each row of the DataFrame. pandas is broadcasting the Series to each row of the DataFrame, which aligns each series item with the DataFrame item of the same index label and then applies the minus operator on the matched values:

In [96]:
   # get first row 
   s = df.iloc[0] 
   # subtract first row from every row of the DataFrame
   diff = df - s 
   diff

Out[96]:
             A         B         C         D
   0  0.000000  0.000000  0.000000  0.000000
   1  0.743000  0.109649  1.628267  0.091396
   2 -1.330961 -1.821706  1.014129  2.207436
   3  0.252443 -0.423908  0.469484  1.407492
   4 -0.894085  0.849884  1.785291  0.048232

This also works when reversing the order by subtracting the DataFrame to the Series object:

In [97]:
   # subtract DataFrame from Series
   diff2 = s - df
   diff2

Out[97]:
             A         B         C         D
   0  0.000000  0.000000  0.000000  0.000000
   1 -0.743000 -0.109649 -1.628267 -0.091396
   2  1.330961  1.821706 -1.014129 -2.207436
   3 -0.252443  0.423908 -0.469484 -1.407492
   4  0.894085 -0.849884 -1.785291 -0.048232

The set of columns returned will be the union of the labels in the index of both the series and the columns index of the DataFrame object. If a label representing the result column is not found in either the Series of the DataFrame object, then the values will be NaN filled. The following code demonstrates, by creating a Series with an index representing a subset of the column in the DataFrame, but also with an additional label:

In [98]:
   # B, C
   s2 = s[1:3]
   # add E
   s2['E'] = 0
   # see how alignment is applied in math
   df + s2

Out[98]:
       A         B         C   D   E
   0 NaN -0.565727 -3.018117 NaN NaN
   1 NaN -0.456078 -1.389850 NaN NaN
   2 NaN -2.387433 -2.003988 NaN NaN
   3 NaN -0.989634 -2.548633 NaN NaN
   4 NaN  0.284157 -1.232826 NaN NaN

pandas aligns the index labels of df with those of s2. Since s2 does not have an A or D label, the result contains NaN in those columns. Since df has no E label, it is also NaN.

An arithmetic operation between two DataFrame objects will align by both the column and index labels. The following extracts a small portion of df and subtracts it from df. The result demonstrates that the aligned values subtract to 0, while the others are set to NaN:

In [99]:
   # get rows 1 through three, and only B, C columns
   subframe = df[1:4][['B', 'C']]
   # we have extracted a little square in the middle of the df
   subframe

Out[99]:
             B         C
   1 -0.173215  0.119209
   2 -2.104569 -0.494929
   1 -0.706771 -1.039575

In [100]:
   # demonstrate the alignment of the subtraction
   df - subframe

Out[100]:
       A   B   C   D
   0 NaN NaN NaN NaN
   1 NaN   0   0 NaN
   2 NaN   0   0 NaN
   3 NaN   0   0 NaN
   2 NaN NaN NaN NaN

Additional control of an arithmetic operation can be gained using the arithmetic methods provided by the DataFrame object. These methods provide the specification of a specific axis. The following demonstrates performing subtraction along a column axis by using the DataFrame objects .sub() method, subtracting the A column from every column:

In [101]:
   # get the A column
   a_col = df['A']
   df.sub(a_col, axis=0)

Out[101]:
      A         B         C         D
   0  0 -0.751976 -1.978171 -1.604745
   1  0 -1.385327 -1.092903 -2.256348
   2  0 -1.242720  0.366920  1.933653
   3  0 -1.428326 -1.761130 -0.449695
   4  0  0.991993  0.701204 -0.662428
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.14.144.216