TFLearn is a library that wraps a lot of new TensorFlow APIs with the nice and familiar scikit-learn API.
TensorFlow is all about a building and executing graphs. This is a very powerful concept, but it is also cumbersome to start with.
Looking under the hood of TF.Learn, we just used three parts:
To install TFLearn, the easiest way is to run the following command:
pip install git+https://github.com/tflearn/tflearn.git
For the latest stable version, use this command:
pip install tflearn
Otherwise, you can also install it from source by running the following (from the source folder):
python setup.py install
In this tutorial, we will learn to use TFLearn and TensorFlow to model the chance of survival of passengers on the Titanic using their personal information (such as gender and age). To tackle this classic ML task, we are going to build a DNN classifier.
Let's take a look at the dataset (TFLearn will automatically download it for you).
For each passenger, the following information is provided:
survived Survived (0 = No; 1 = Yes) pclass Passenger Class (1 = st; 2 = nd; 3 = rd) name Name sex Sex age Age sibsp Number of Siblings/Spouses Aboard parch Number of Parents/Children Aboard ticket Ticket Number fare Passenger Fare
Here are some examples from the dataset:
|
|
|
|
|
|
|
|
|
1 |
1 |
Aubart, Mme. Leontine Pauline |
female |
24 |
0 |
0 |
PC 17477 |
69.3000 |
0 |
2 |
Bowenur, Mr. Solomon |
male |
42 |
0 |
0 |
211535 |
13.0000 |
1 |
3 |
Baclini, Miss. Marie Catherine |
female |
5 |
2 |
1 |
2666 |
19.2583 |
0 |
3 |
Youseff, Mr. Gerious |
male |
45.5 |
0 |
0 |
2628 |
7.2250 |
There are two classes in our task: not survived (class = 0) and survived (class = 1). The passenger data has 8 features. The Titanic dataset is stored in a CSV file, so we can use the TFLearn load_csv()
function to load the data from the file into a Python list. We specify the target_column
argument to indicate that our labels (survived or not) are located in the first column (id: 0). The functions will return a tuple: (data, labels).
Let's start with importing the NumPy and TFLearn libraries:
import numpy as np import tflearn as tfl
Download the Titanic dataset:
from tflearn.datasets import titanic titanic.download_dataset('titanic_dataset.csv')
Load the CSV file, and indicate that the first column represents labels
:
from tflearn.data_utils import load_csv data, labels = load_csv('titanic_dataset.csv', target_column=0, categorical_labels=True, n_classes=2)
Data needs some preprocessing before it is ready to be used in our DNN classifier. We must delete the column fields that won't help us with our analysis. We discard the name and ticket fields, because we estimate that a passenger's name and ticket are not related with their chance of surviving:
def preprocess(data, columns_to_ignore):
The preprocessing phase starts by descending the id and delete columns:
for id in sorted(columns_to_ignore, reverse=True): [r.pop(id) for r in data] for i in range(len(data)):
The sex field is converted to float (to be manipulated):
data[i][1] = 1. if data[i][1] == 'female' else 0. return np.array(data, dtype=np.float32)
As already described, the name and ticket fields will be ignored by the analysis:
to_ignore=[1, 6]
Then we call the preprocess
procedure:
data = preprocess(data, to_ignore)
Next, we specify the shape of our input data. The input sample has a total of 6
features, and we will process samples in batches to save memory, so our data input shape is [None, 6]
. The None
parameter means an unknown dimension, so we can change the total number of samples that are processed in a batch:
net = tfl.input_data(shape=[None, 6])
Finally, we build a 3-layer neural network with this simple sequence of statements:
net = tfl.fully_connected(net, 32) net = tfl.fully_connected(net, 32) net = tfl.fully_connected(net, 2, activation='softmax') net = tfl.regression(net)
TFLearn provides a model wrapper, DNN
, that automatically performs neural network classifier tasks:
model = tfl.DNN(net)
We will run it for 10
epochs with a batch size of 16
:
model.fit(data, labels, n_epoch=10, batch_size=16, show_metric=True)
When we run the model, we should get the following output:
Training samples: 1309 Validation samples: 0 -- Training Step: 82 | total loss: 0.64003 | Adam | epoch: 001 | loss: 0.64003 - acc: 0.6620 -- iter: 1309/1309 -- Training Step: 164 | total loss: 0.61915 | Adam | epoch: 002 | loss: 0.61915 - acc: 0.6614 -- iter: 1309/1309 -- Training Step: 246 | total loss: 0.56067 | Adam | epoch: 003 | loss: 0.56067 - acc: 0.7171 -- iter: 1309/1309 -- Training Step: 328 | total loss: 0.51807 | Adam | epoch: 004 | loss: 0.51807 - acc: 0.7799 -- iter: 1309/1309 -- Training Step: 410 | total loss: 0.47475 | Adam | epoch: 005 | loss: 0.47475 - acc: 0.7962 -- iter: 1309/1309 -- Training Step: 574 | total loss: 0.48988 | Adam | epoch: 007 | loss: 0.48988 - acc: 0.7891 -- iter: 1309/1309 -- Training Step: 656 | total loss: 0.55073 | Adam | epoch: 008 | loss: 0.55073 - acc: 0.7427 -- iter: 1309/1309 -- Training Step: 738 | total loss: 0.50242 | Adam | epoch: 009 | loss: 0.50242 - acc: 0.7854 -- iter: 1309/1309 -- Training Step: 820 | total loss: 0.41557 | Adam | epoch: 010 | loss: 0.41557 - acc: 0.8110 -- iter: 1309/1309 --
The model accuracy is around 81%, which means that it can predict the correct outcome (that is, whether the passenger survived or not) for 81% of the passengers.
18.219.142.20