Arithmetic operations using scalar values will be applied to every element of a DataFrame
. To demonstrate, we will use a DataFrame
object initialized with random values:
In [94]: # set the seed to allow replicatable results np.random.seed(123456) # create the DataFrame df = pd.DataFrame(np.random.randn(5, 4), columns=['A', 'B', 'C', 'D']) df Out[94]: A B C D 0 0.469112 -0.282863 -1.509059 -1.135632 1 1.212112 -0.173215 0.119209 -1.044236 2 -0.861849 -2.104569 -0.494929 1.071804 3 0.721555 -0.706771 -1.039575 0.271860 4 -0.424972 0.567020 0.276232 -1.087401
By default, any arithmetic operation will be applied across all rows and columns of a DataFrame
and will return a new DataFrame
with the results (leaving the original unchanged):
In [95]: # multiply everything by 2 df * 2 Out[95]: A B C D 0 0.938225 -0.565727 -3.018117 -2.271265 1 2.424224 -0.346429 0.238417 -2.088472 2 -1.723698 -4.209138 -0.989859 2.143608 3 1.443110 -1.413542 -2.079150 0.543720 4 -0.849945 1.134041 0.552464 -2.174801
When performing an operation between a DataFrame
and a Series
, pandas will align the Series
index along the DataFrame
columns, performing what is referred to as a row-wise broadcast.
The following example retrieves the first row of the DataFrame
, and then subtracts this from each row of the DataFrame
. pandas is broadcasting the Series
to each row of the DataFrame
, which aligns each series item with the DataFrame
item of the same index label and then applies the minus operator on the matched values:
In [96]: # get first row s = df.iloc[0] # subtract first row from every row of the DataFrame diff = df - s diff Out[96]: A B C D 0 0.000000 0.000000 0.000000 0.000000 1 0.743000 0.109649 1.628267 0.091396 2 -1.330961 -1.821706 1.014129 2.207436 3 0.252443 -0.423908 0.469484 1.407492 4 -0.894085 0.849884 1.785291 0.048232
This also works when reversing the order by subtracting the DataFrame
to the Series
object:
In [97]: # subtract DataFrame from Series diff2 = s - df diff2 Out[97]: A B C D 0 0.000000 0.000000 0.000000 0.000000 1 -0.743000 -0.109649 -1.628267 -0.091396 2 1.330961 1.821706 -1.014129 -2.207436 3 -0.252443 0.423908 -0.469484 -1.407492 4 0.894085 -0.849884 -1.785291 -0.048232
The set of columns returned will be the union of the labels in the index of both the series and the columns index of the DataFrame
object. If a label representing the result column is not found in either the Series
of the DataFrame
object, then the values will be NaN
filled. The following code demonstrates, by creating a Series
with an index representing a subset of the column in the DataFrame
, but also with an additional label:
In [98]: # B, C s2 = s[1:3] # add E s2['E'] = 0 # see how alignment is applied in math df + s2 Out[98]: A B C D E 0 NaN -0.565727 -3.018117 NaN NaN 1 NaN -0.456078 -1.389850 NaN NaN 2 NaN -2.387433 -2.003988 NaN NaN 3 NaN -0.989634 -2.548633 NaN NaN 4 NaN 0.284157 -1.232826 NaN NaN
pandas aligns the index labels of df
with those of s2
. Since s2
does not have an A
or D
label, the result contains NaN
in those columns. Since df
has no E
label, it is also NaN
.
An arithmetic operation between two DataFrame
objects will align by both the column and index labels. The following extracts a small portion of df
and subtracts it from df
. The result demonstrates that the aligned values subtract to 0
, while the others are set to NaN
:
In [99]: # get rows 1 through three, and only B, C columns subframe = df[1:4][['B', 'C']] # we have extracted a little square in the middle of the df subframe Out[99]: B C 1 -0.173215 0.119209 2 -2.104569 -0.494929 1 -0.706771 -1.039575 In [100]: # demonstrate the alignment of the subtraction df - subframe Out[100]: A B C D 0 NaN NaN NaN NaN 1 NaN 0 0 NaN 2 NaN 0 0 NaN 3 NaN 0 0 NaN 2 NaN NaN NaN NaN
Additional control of an arithmetic operation can be gained using the arithmetic methods provided by the DataFrame
object. These methods provide the specification of a specific axis. The following demonstrates performing subtraction along a column axis by using the DataFrame
objects .sub()
method, subtracting the A
column from every column:
In [101]: # get the A column a_col = df['A'] df.sub(a_col, axis=0) Out[101]: A B C D 0 0 -0.751976 -1.978171 -1.604745 1 0 -1.385327 -1.092903 -2.256348 2 0 -1.242720 0.366920 1.933653 3 0 -1.428326 -1.761130 -0.449695 4 0 0.991993 0.701204 -0.662428
3.14.144.216