Slicing and selection

In R, we slice objects in the following three ways:

  • [: This always returns an object of the same type as the original and can be used to select more than one element.
  • [[: This is used to extract elements of list or DataFrame; and can only be used to extract a single element,: the type of the returned element will not necessarily be a list or DataFrame.
  • $: This is used to extract elements of a list or DataFrame by name and is similar to [[.

Here are some slicing examples in R and their equivalents in pandas:

R-matrix and NumPy array compared

Let's see matrix creation and selection in R:

>r_mat<- matrix(2:13,4,3)
>r_mat
     [,1] [,2] [,3]
[1,]    2    6   10
[2,]    3    7   11
[3,]    4    8   12
[4,]    5    9   13

To select first row, we write:

>r_mat[1,]
[1]  2  6 10

To select second column, we use the following command:

>r_mat[,2]
[1] 6 7 8 9

Let's now see NumPy array creation and selection:

In [60]: a=np.array(range(2,6))
         b=np.array(range(6,10))
         c=np.array(range(10,14))
In [66]: np_ar=np.column_stack([a,b,c])
np_ar
Out[66]: array([[ 2,  6, 10],
[ 3,  7, 11],
[ 4,  8, 12],
[ 5,  9, 13]])

To select first row, write the following command:

In [79]: np_ar[0,]
Out[79]: array([ 2,  6, 10])

Note

Indexing is different in R and pandas/NumPy.

In R, indexing starts at 1, while in pandas/NumPy, it starts at 0. Hence, we have to subtract 1 from all indexes when making the translation from R to pandas/NumPy.

To select second column, write the following command:

In [81]: np_ar[:,1]
Out[81]: array([6, 7, 8, 9])

Another option is to transpose the array first and then select the column, as follows:

In [80]: np_ar.T[1,]
Out[80]: array([6, 7, 8, 9])

R lists and pandas series compared

Here is an example of list creation and selection in R:

>cal_lst<- list(weekdays=1:8, mth='jan')
>cal_lst
$weekdays
[1] 1 2 3 4 5 6 7 8

$mth
[1] "jan"

>cal_lst[1]
$weekdays
[1] 1 2 3 4 5 6 7 8

>cal_lst[[1]]
[1] 1 2 3 4 5 6 7 8

>cal_lst[2]
$mth
[1] "jan"

Series creation and selection in pandas is done as follows:

In [92]: cal_df= pd.Series({'weekdays':range(1,8), 'mth':'jan'})
In [93]: cal_df
Out[93]: mthjan
weekdays    [1, 2, 3, 4, 5, 6, 7]
dtype: object

In [97]: cal_df[0]
Out[97]: 'jan'

In [95]: cal_df[1]
Out[95]: [1, 2, 3, 4, 5, 6, 7]

In [96]: cal_df[[1]]
Out[96]: weekdays    [1, 2, 3, 4, 5, 6, 7]
dtype: object

Here, we see a difference between an R-list and a pandas series from the perspective of the [] and [[]] operators. We can see the difference by considering the second item, which is a character string.

In the case of R, the [] operator produces a container type, that is, a list containing the string, while the [[]] produces an atomic type: in this case, a character as follows:

>typeof(cal_lst[2])
[1] "list"
>typeof(cal_lst[[2]])
[1] "character"

In the case of pandas, the opposite is true: [] produces the atomic type, while [[]] results in a complex type, that is, a series as follows:

In [99]: type(cal_df[0])
Out[99]: str

In [101]: type(cal_df[[0]])
Out[101]: pandas.core.series.Series

In both R and pandas, the column name can be specified in order to obtain an element.

Specifying column name in R

In R, this can be done with the column name preceded by the $ operator as follows:

>cal_lst$mth
[1] "jan"
> cal_lst$'mth'
[1] "jan"

Specifying column name in pandas

In pandas, we subset elements in the usual way with the column name in square brackets:

In [111]: cal_df['mth']
Out[111]: 'jan'

One area where R and pandas differ is in the subsetting of nested elements. For example, to obtain day 4 from weekdays, we have to use the [[]] operator in R:

>cal_lst[[1]][[4]]
[1] 4

>cal_lst[[c(1,4)]]
[1] 4 

However, in the case of pandas, we can just use a double []:

In [132]: cal_df[1][3]
Out[132]: 4

R's DataFrames versus pandas' DataFrames

Selecting data in R DataFrames and pandas DataFrames follows a similar script. The following section explains on how we perform multi-column selects from both.

Multicolumn selection in R

In R, we specify the multiple columns to select by stating them in a vector within square brackets:

>stocks_table[c('Symbol','Price')]
Symbol  Price
1   GOOG 518.70
2   AMZN 307.82
3     FB  74.90
4   AAPL 109.70
5   TWTR  37.10
6   NFLX 334.48
7  LINKD 219.90

>stocks_table[,c('Symbol','Price')]
Symbol  Price
1   GOOG 518.70
2   AMZN 307.82
3     FB  74.90
4   AAPL 109.70
5   TWTR  37.10
6   NFLX 334.48
7  LINKD 219.90

Multicolumn selection in pandas

In pandas, we subset elements in the usual way with the column names in square brackets:

In [140]: stocks_df[['Symbol','Price']]
Out[140]:Symbol Price
0        GOOG   518.70
1        AMZN   307.82
2        FB     74.90
3        AAPL   109.70
4        TWTR   37.10
5        NFLX   334.48
6        LNKD   219.90

In [145]: stocks_df.loc[:,['Symbol','Price']]
Out[145]: Symbol  Price
0         GOOG    518.70
1         AMZN    307.82
2         FB      74.90
3         AAPL    109.70
4         TWTR    37.10
5         NFLX    334.48
6         LNKD    219.90
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.159.178