How to do it...

Read in the names dataset, and output it:

>>> names = pd.read_csv('data/names.csv')
>>> names

Let's create a list that contains some new data and use the .loc indexer to set a single row label equal to this new data:

>>> new_data_list = ['Aria', 1]
>>> names.loc[4] = new_data_list
>>> names

The .loc indexer uses labels to refer to the rows. In this case, the row labels exactly match the integer location. It is possible to append more rows with non-integer labels:

>>> names.loc['five'] = ['Zach', 3]
>>> names

To be more explicit in associating variables to values, you may use a dictionary. Also, in this step, we can dynamically choose the new index label to be the length of the DataFrame:

>>> names.loc[len(names)] = {'Name':'Zayd', 'Age':2}
>>> names

A Series can hold the new data as well and works exactly the same as a dictionary:

>>> names.loc[len(names)] = pd.Series({'Age':32,
                                       'Name':'Dean'})
>>> names

The preceding operations all use the .loc indexing operator to make changes to the names DataFrame in-place. There is no separate copy of the DataFrame that is returned. In the next few steps, we will look at the append method, which does not modify the calling DataFrame. Instead, it returns a new copy of the DataFrame with the appended row(s). Let's begin with the original names DataFrame and attempt to append a row. The first argument to append must be either another DataFrame, Series, dictionary, or a list of these, but not a list like the one in step 2. Let's see what happens when we attempt to use a dictionary with append:

>>> names = pd.read_csv('data/names.csv')
>>> names.append({'Name':'Aria', 'Age':1})
TypeError: Can only append a Series if ignore_index=True or if the Series has a name

This error message appears to be slightly incorrect. We are passing a DataFrame and not a Series but nevertheless, it gives us instructions on how to correct it:

>>> names.append({'Name':'Aria', 'Age':1}, ignore_index=True)

This works but ignore_index is a sneaky parameter. When set to True, the old index will be removed completely and replaced with a RangeIndex from 0 to n-1. For instance, let's specify an index for the names DataFrame:

>>> names.index = ['Canada', 'Canada', 'USA', 'USA']
>>> names

Rerun the code from step 7 and you will get the same result. The original index is completely ignored.
Let's continue with this names dataset with these country strings in the index and use a Series that has a name attribute with the append method:

>>> s = pd.Series({'Name': 'Zach', 'Age': 3}, name=len(names))
>>> s
Age        3
Name    Zach
Name: 4, dtype: object

>>> names.append(s)

The append method is more flexible than the .loc indexer. It supports appending multiple rows at the same time. One way to accomplish this is with a list of Series:

>>> s1 = pd.Series({'Name': 'Zach', 'Age': 3}, name=len(names))
>>> s2 = pd.Series({'Name': 'Zayd', 'Age': 2}, name='USA')
>>> names.append([s1, s2])

Small DataFrames with only two columns are simple enough to manually write out all the column names and values. When they get larger, this process will be quite painful. For instance, let's take a look at the 2016 baseball dataset:

>>> bball_16 = pd.read_csv('data/baseball16.csv')
>>> bball_16.head()

This dataset contains 22 columns and it would be easy to mistype a column name or forget one altogether if you were manually entering new rows of data. To help protect against these mistakes, let's select a single row as a Series and chain the to_dict method to it to get an example row as a dictionary:

>>> data_dict = bball_16.iloc[0].to_dict()
>>> print(data_dict)
{'playerID': 'altuvjo01', 'yearID': 2016, 'stint': 1, 'teamID': 'HOU', 'lgID': 'AL', 'G': 161, 'AB': 640, 'R': 108, 'H': 216, '2B': 42, '3B': 5, 'HR': 24, 'RBI': 96.0, 'SB': 30.0, 'CS': 10.0, 'BB': 60, 'SO': 70.0, 'IBB': 11.0, 'HBP': 7.0, 'SH': 3.0, 'SF': 7.0, 'GIDP': 15.0}

Clear the old values with a dictionary comprehension assigning any previous string value as an empty string and all others, missing values. This dictionary can now serve as a template for any new data you would like to enter:

>>> new_data_dict = {k: '' if isinstance(v, str) else 
                        np.nan for k, v in data_dict.items()}
>>> print(new_data_dict)
{'playerID': '', 'yearID': nan, 'stint': nan, 'teamID': '', 'lgID': '', 'G': nan, 'AB': nan, 'R': nan, 'H': nan, '2B': nan, '3B': nan, 'HR': nan, 'RBI': nan, 'SB': nan, 'CS': nan, 'BB': nan, 'SO': nan, 'IBB': nan, 'HBP': nan, 'SH': nan, 'SF': nan, 'GIDP': nan}

Table of Contents for How to do it...

Create new playlist

Sign In

Sign Up

Table of Contents for
How to do it...