How to do it...

Read in the 2016 and 2017 stock datasets, and make their ticker symbol the index:

>>> stocks_2016 = pd.read_csv('data/stocks_2016.csv', 
                              index_col='Symbol')
>>> stocks_2017 = pd.read_csv('data/stocks_2017.csv',
                              index_col='Symbol')

Place all the stock datasets into a single list, and then call the concat function to concatenate them together:

>>> s_list = [stocks_2016, stocks_2017]
>>> pd.concat(s_list)

By default, the concat function concatenates DataFrames vertically, one on top of the other. One issue with the preceding DataFrame is that there is no way to identify the year of each row. The concat function allows each piece of the resulting DataFrame to be labeled with the keys parameter. This label will appear in the outermost index level of the concatenated frame and force the creation of a MultiIndex. Also, the names parameter has the ability to rename each index level for clarity:

>>> pd.concat(s_list, keys=['2016', '2017'], 
              names=['Year', 'Symbol'])

It is also possible to concatenate horizontally by changing the axis parameter to columns or 1:

>>> pd.concat(s_list, keys=['2016', '2017'],
              axis='columns', names=['Year', None])

Notice that missing values appear whenever a stock symbol is present in one year but not the other. The concat function, by default, uses an outer join, keeping all rows from each DataFrame in the list. However, it gives us options to only keep rows that have the same index values in both DataFrames. This is referred to as an inner join. We set the join parameter to inner to change the behavior:

>>> pd.concat(s_list, join='inner', keys=['2016', '2017'],
              axis='columns', names=['Year', None])

Table of Contents for How to do it...

Create new playlist

Sign In

Sign Up

Table of Contents for
How to do it...