Creating our pipeline

To review, we have transformed the columns in our dataset in the following ways thus far:

boolean, city: dummy encoding
ordinal_column: label encoding
quantitative_column: ordinal level data

Since we now have transformations for all of our columns, let's put everything together in a pipeline.

Start with importing our Pipeline class from scikit-learn:

from sklearn.pipeline import Pipeline

We will bring together each of the custom transformers that we have created. Here is the order we will follow in our pipeline:

First, we will utilize the imputer to fill in missing values
Next, we will dummify our categorical columns
Then, we will encode the ordinal_column
Finally, we will bucket the quantitative_column

Let's set up our pipeline as follows:

pipe = Pipeline([("imputer", imputer), ('dummify', cd), ('encode', ce), ('cut', cc)])
# will use our initial imputer
# will dummify variables first
# then encode the ordinal column
# then bucket (bin) the quantitative column

In order to see the full transformation of our data using our pipeline, let's take a look at our data with zero transformations:

# take a look at our data before fitting our pipeline
print X

This is what our data looked like in the beginning before any transformations were made:

	boolean	city	ordinal_column	quantitative_column
0	yes	tokyo	somewhat like	1.0
1	no	None	like	11.0
2	None	london	somewhat like	-0.5
3	no	seattle	like	10.0
4	no	san francisco	somewhat like	NaN
5	yes	tokyo	dislike	20.0

We can now fit our pipeline:

# now fit our pipeline
pipe.fit(X)

>>>>
Pipeline(memory=None,
     steps=[('imputer', Pipeline(memory=None,
     steps=[('quant', <__main__.CustomQuantitativeImputer object at 0x128bf00d0>), ('category', <__main__.CustomCategoryImputer object at 0x13666bf50>)])), ('dummify', <__main__.CustomDummifier object at 0x128bf0ed0>), ('encode', <__main__.CustomEncoder object at 0x127e145d0>), ('cut', <__main__.CustomCutter object at 0x13666bc90>)])

We have created our pipeline object, let's transform our DataFrame:

pipe.transform(X)

Here is what our final dataset looks like after undergoing all of the appropriate transformations by column:

	ordinal_column	quantitative_column	boolean_no	boolean_yes	city_london	city_san francisco	city_seattle	city_tokyo
0	1	0	0	1	0	0	0	1
1	2	1	1	0	0	0	0	1
2	1	0	1	0	1	0	0	0
3	2	1	1	0	0	0	1	0
4	1	1	1	0	0	1	0	0
5	0	2	0	1	0	0	0	1

Table of Contents for Creating our pipeline

Create new playlist

Sign In

Sign Up

Table of Contents for
Creating our pipeline