Chapter 10. R and pandas Compared

This chapter focuses on comparing pandas with R, the statistical package on which much of pandas' functionality is modeled. It is intended as a guide for R users who wish to use pandas, and for users who wish to replicate functionality that they have seen in the R code in pandas. It focuses on some key features available to R users and shows how to achieve similar functionality in pandas by using some illustrative examples. This chapter assumes that you have the R statistical package installed. If not, it can be downloaded and installed from here: http://www.r-project.org/.

By the end of the chapter, data analysis users should have a good grasp of the data analysis capabilities of R as compared to pandas, enabling them to transition to or use pandas, should they need to. The various topics addressed in this chapter include the following:

  • R data types and their pandas equivalents
  • Slicing and selection
  • Arithmetic operations on datatype columns
  • Aggregation and GroupBy
  • Matching
  • Split-apply-combine
  • Melting and reshaping
  • Factors and categorical data

R data types

R has five primitive or atomic types:

  • Character
  • Numeric
  • Integer
  • Complex
  • Logical/Boolean

It also has the following, more complex, container types:

  • Vector: This is similar to numpy.array. It can only contain objects of the same type.
  • List: It is a heterogeneous container. Its equivalent in pandas would be a series.
  • DataFrame: It is a heterogeneous 2D container, equivalent to a pandas DataFrame
  • Matrix:- It is a homogeneous 2D version of a vector. It is similar to a numpy.matrix.

For this chapter, we will focus on list and DataFrame, which have pandas equivalents as series and DataFrame.

Note

For more information on R data types, refer to the following document at: http://www.statmethods.net/input/datatypes.html.

For NumPy data types, refer to the following document at: http://docs.scipy.org/doc/numpy/reference/generated/numpy.array.html and http://docs.scipy.org/doc/numpy/reference/generated/numpy.matrix.html.

R lists

R lists can be created explicitly as a list declaration as shown here:

>h_lst<- list(23,'donkey',5.6,1+4i,TRUE)
>h_lst
[[1]]
[1] 23

[[2]]
[1] "donkey"

[[3]]
[1] 5.6

[[4]]
[1] 1+4i

[[5]]
[1] TRUE

>typeof(h_lst)
[1] "list"

Here is its series equivalent in pandas with the creation of a list and the creation of a series from it:

In [8]: h_list=[23, 'donkey', 5.6,1+4j, True]
In [9]: import pandas as pd
        h_ser=pd.Series(h_list)
In [10]: h_ser
Out[10]: 0        23
         1    donkey
         2       5.6
         3    (1+4j)
         4      True
dtype: object

Array indexing starts from 0 in pandas as opposed to R, where it starts at 1. Following is an example of this:

In [11]: type(h_ser)
Out[11]: pandas.core.series.Series

R DataFrames

We can construct an R DataFrame as follows by calling the data.frame() constructor and then display it as follows:

>stocks_table<- data.frame(Symbol=c('GOOG','AMZN','FB','AAPL',
                                      'TWTR','NFLX','LINKD'), 
                            Price=c(518.7,307.82,74.9,109.7,37.1,
                                           334.48,219.9),
MarketCap=c(352.8,142.29,216.98,643.55,23.54,20.15,27.31))

>stocks_table
Symbol  PriceMarketCap
1   GOOG 518.70    352.80
2   AMZN 307.82    142.29
3     FB  74.90    216.98
4   AAPL 109.70    643.55
5   TWTR  37.10     23.54
6   NFLX 334.48     20.15
7  LINKD 219.90     27.31

Here, we construct a pandas DataFrame and display it:

In [29]: stocks_df=pd.DataFrame({'Symbol':['GOOG','AMZN','FB','AAPL', 
                                           'TWTR','NFLX','LNKD'],
                                 'Price':[518.7,307.82,74.9,109.7,37.1,
         334.48,219.9],
'MarketCap($B)' : [352.8,142.29,216.98,643.55,
                                                    23.54,20.15,27.31]
                                 })
stocks_df=stocks_df.reindex_axis(sorted(stocks_df.columns,reverse=True),axis=1)
stocks_df
Out[29]:
Symbol  PriceMarketCap($B)
0       GOOG    518.70  352.80
1       AMZN    307.82  142.29
2       FB      74.90   216.98
3       AAPL    109.70  643.55
4       TWTR    37.10   23.54
5       NFLX    334.48  20.15
6       LNKD219.90  27.31
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.129.210.91