Working on the DataFrame

Let's now move on to getting our DataFrame ready to work with:

df = pd.DataFrame(ipo_list) 
 
df.head() 

The preceding code generates the following output:

The data looks good, so let's now add our columns:

df.columns = ['Date', 'Company', 'Ticker', 'Managers',  
              'Offer Price', 'Opening Price', '1st Day Close', 
              '1st Day % Chg', '$ Chg Open', '$ Chg Close', 
              'Star Rating', 'Performed'] 
 
df.head() 

The preceding code generates the following output:

Let's now convert that Date column from a float to a proper date. The xlrd library has some functionality that can help us with that. We'll use it in a function to get our dates in the proper format:

def to_date(x): 
    return xlrd.xldate.xldate_as_datetime(x, wb.datemode) 
df['Date'] = df['Date'].apply(to_date) 
df 

The preceding code generates the following output:

Now that we have dates that we can work with, let's add some additional date-related columns that can help us work with the data better:

df['Year'], df['Month'], df['Day'], df['Day of Week'] =  
df['Date'].dt.year, df['Date'].dt.month, df['Date'].dt.day, df['Date'].dt.weekday 
df 

The preceding code generates the following output:

Now that we've completed those steps, let's check our data in the DataFrame against the data in the original spreadsheet:

by_year_cnt = df.groupby('Year')[['Ticker']].count() 
 
by_year_cnt 

The preceding code generates the following output:

Comparing this to the same values in the spreadsheet shows us that we have nearly identical values, so we should be good to continue.

We'll take one additional step here to eliminate what are sometimes referred to as penny stocks, or particularly low-priced stocks. Then, we'll check the data types to ensure they look appropriate:

df.drop(df[df['Offer Price'] < 5].index, inplace=True) 
 
df.reset_index(drop=True, inplace=True) 
 
df.dtypes 

The preceding code generates the following output:

This looks to be in line with what we expect, with the exception of the 1st Day % Chg column. We'll correct that now by changing the data type to a float:

df['1st Day % Chg'] = df['1st Day % Chg'].astype(float) 
df.dtypes 

The preceding code generates the following output:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.66.94