Mapping

Pandas supports element-wise operations just like NumPy (after all, pd.Series stores their data using np.array). For example, it is possible to apply transformation very easily on both pd.Series and pd.DataFrame:

    np.log(df.sys_initial) # Logarithm of a series
df.sys_initial ** 2 # Square a series
np.log(df) # Logarithm of a dataframe
df ** 2 # Square of a dataframe

You can also perform element-wise operations between two pd.Series in a way similar to NumPy. An important difference is that the operands will be matched by key, rather than by position; if there is a mismatch in the index, the resulting value will be set to NaN. Both the scenarios are exemplified in the following example:

    # Matching index
a = pd.Series([1, 2, 3], index=["a", "b", "c"])
b = pd.Series([4, 5, 6], index=["a", "b", "c"])
a + b
# Result:
# a 5
# b 7
# c 9
# dtype: int64

# Mismatching index
b = pd.Series([4, 5, 6], index=["a", "b", "d"])
# Result:
# a 5.0
# b 7.0
# c NaN
# d NaN
# dtype: float64

For added flexibility, Pandas exposes the map, apply, and applymap methods that can be used to apply specific transformations.

The pd.Series.map method can be used to execute a function to each value and return a pd.Series containing each result. In the following example, we show how to apply the superstar function to each element of a pd.Series:

    a = pd.Series([1, 2, 3], index=["a", "b", "c"])
def superstar(x):
return '*' + str(x) + '*'
a.map(superstar)

# Result:
# a *1*
# b *2*
# c *3*
# dtype: object

The pd.DataFrame.applymap function is the equivalent of pd.Series.map, but for DataFrames:

    df.applymap(superstar)
# Result:
# dia_final dia_initial sys_final sys_initial
# a *70* *75* *115* *120*
# b *82* *85* *123* *126*
# c *92* *90* *130* *130*
# d *87* *87* *118* *115*

Finally, the pd.DataFrame.apply function can apply the passed function to each column or each row, rather than element-wise. The selection can be performed with the argument axis, where a value of 0 (the default) corresponds to columns, and 1 corresponds to rows. Also, note that the return value of apply is a pd.Series:

    df.apply(superstar, axis=0)
# Result:
# dia_final *a 70nb 82nc 92nd 87nName: dia...
# dia_initial *a 75nb 85nc 90nd 87nName: dia...
# sys_final *a 115nb 123nc 130nd 118nName:...
# sys_initial *a 120nb 126nc 130nd 115nName:...
# dtype: object

df.apply(superstar, axis=1)
# Result:
# a *dia_final 70ndia_initial 75nsys_f...
# b *dia_final 82ndia_initial 85nsys_f...
# c *dia_final 92ndia_initial 90nsys_f...
# d *dia_final 87ndia_initial 87nsys_f...
# dtype: object

Pandas also supports efficient numexpr-style expressions with the convenient eval method. For example, if we want to calculate the difference in the final and initial blood pressure, we can write the expression as a string, as shown in the following code:

    df.eval("sys_final - sys_initial")
# Result:
# a -5
# b -3
# c 0
# d 3
# dtype: int64

It is also possible to create new columns using the assignment operator in the pd.DataFrame.eval expression. Note that, if the inplace=True argument is used, the operation will be applied directly on the original pd.DataFrame; otherwise, the function will return a new dataframe. In the next example, we compute the difference between sys_final and sys_initial, and we store it in the sys_delta column:

df.eval("sys_delta = sys_final - sys_initial", inplace=False)
# Result:
# dia_final dia_initial sys_final sys_initial sys_delta
# a 70 75 115 120 -5
# b 82 85 123 126 -3
# c 92 90 130 130 0
# d 87 87 118 115 3
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.212.102