Adding features to influence the performance of an IPO

One measure of demand that could be informative is the opening gap. This is the difference between the offer price and the opening price of the issue. Let's add that to our DataFrame:

df['Opening Gap % Chg'] = (df['Opening Price'] - df['Offer Price'])/df['Offer Price'] 

Next, let's get a count of the number of underwriters on the offering. Perhaps having more banks involved leads to better marketing of the issue? This is demonstrated in the following code block:

def get_mgr_count(x): 
    return len(x.split('/')) 
 
df['Mgr Count'] = df['Managers'].apply(get_mgr_count) 

Let's quickly see whether there might be anything to this hypothesis by means of a visualization:

df.groupby('Mgr Count')['1st Day Open to Close % Chg'].mean().to_frame().style.bar(align='mid', color=['#d65f5f', '#5fba7d']) 

The preceding code generates the following output:

It's not apparent what the relationship might be from this chart, but clearly nine bankers is the sweet spot!

Next, let's move on to extracting the first underwriter in the list. This would be the lead, and perhaps the prestige of this bank is important to the first-day gains:

df['Lead Mgr'] = df['Managers'].apply(lambda x: x.split('/')[0])

Next, let's take a quick peek at the data in the new column that we have created:

df['Lead Mgr'].unique() 

The preceding code generates the following output:

Even a cursory examination of the preceding shows us that we have some genuine issues with the data. Many names are replicated with different spellings and punctuation. We could, at this point, stop and attempt to clean up the data, and this would be the proper course of action if we were going to rely on our model for anything serious, but as this is just a toy project, we'll forge ahead and hope that the impact is minimal.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.140.188.244