Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Previous Chapter

Chapter 10: Dimensionality Reduction with PCA and LDA Using Sklearn

Exercises Solutions

Exercise 2.1

Question 1

Which iteration should be used when you want to repeatedly execute a code for a specific number of times?

A.For Loop

B.While Loop

C.Both A and B

D.None of the above

Answer: A

Question 2

What is the maximum number of values that a function can return in Python?

A.Single Value

B.Double Value

C.More than two values

D.None

Answer: C

Question 3

Which of the following membership operators are supported by Python?

A.In

B.Out

C.Not In

D.Both A and C

Answer: D

Exercise 2.2.

Print the table of integer 9 using a while loop:

1. j=1

2. while j< 11:

3. print(“9 x “+str(j)+ “ = “+ str(9*j))

4. j=j+1

Exercise 3.1

Question 1:

Which NumPy function is used for the element-wise multiplication of two matrices?

A.np.dot(matrix1, matrix2)

B.np.multiply(matrix1, matrix2)

C.np.elementwise(matrix1, matrix2)

D.None of the above

Answer: B

Question 2:

To generate an identity matrix of four rows and four columns, which of the following functions can be used?

A.np.identity(4,4)

B.np.id(4,4)

C.np.eye(4,4)

D.All of the above

Answer: C

Question 3:

How to create the array of numbers 4,7,10,13,16 with NumPy:

A.np.arange(3, 16, 3)

B.np.arange(4, 16, 3)

C.np.arange(4, 15,3)

D.None of the above

Answer: D

Exercise 3.2

Create a random NumPy array of five rows and four columns. Using array indexing and slicing, display the items from row three to end and column two to end.

Solution:

1. uniform_random = np.random.rand(4, 5)

2. print(uniform_random)

3. print(“Result”)

4. print(uniform_random[2:,3:])

Exercise 4.1

Question 1

In order to horizontally concatenate two Pandas dataframes, the value for the axis attribute should be set to:

A.0

B.1

C.2

D.None of the above

Answer: B

Question 2

Which function is used to sort the Pandas dataframe by a column value?

A.sort_dataframe()

B.sort_rows()

C.sort_values()

D.sort_records()

Answer: C

Question 3

To filter columns from a Pandas dataframe, you have to pass a list of column names to one of the following method:

A.filter()

B.filter_columns()

C.apply_filter ()

D.None of the above()

Answer: A

Exercise 4.2

Use the apply function to subtract 10 from the Fare column of the Titanic dataset, without using the lambda expression.

Solution:

1. def subt(x):

2. return x - 10

4. updated_class = titanic_data.Fare.apply(subt)

5. updated_class.head()

Exercise 5.1

Question 1

Which Pandas function is used to plot a horizontal bar plot:

A.horz_bar()

B.barh()

C.bar_horizontal()

D.horizontal_bar()

Answer: B

Question 2:

To create a legend, the value for which of the following parameters is needed to be specified?

A.title

B.label

C.axis

D.All of the above

Answer: B

Question 3:

How to show percentage values on a Matplotlib Pie Chart?

A.autopct = ‘%1.1f%%’

B.percentage = ‘%1.1f%%’

C.perc = ‘%1.1f%%’

D.None of the Above

Answer: A

Exercise 5.2

Plot two scatter plots on the same graph using the tips_dataset. In the first scatter plot, display values from the total_bill column on the x-axis and from the tip column on the y-axis. The color of the first scatter plot should be green. In the second scatter plot, display values from the total_bill column on the x-axis and from the size column on the y-axis. The color of the second scatter plot should be blue, and the markers should be x.

Solution:

1. sns.scatterplot(x=”total_bill”, y=”tip”, data=tips_data, color = ‘g’)

2. sns.scatterplot(x=”total_bill”, y=”size”, data=tips_data, color = ‘b’, marker = ‘x’)

Output:

Exercise 6.1

Question 1

Among the following, which one is an example of a regression output?

A.True

B.Red

C.2.5

D.None of the above

Answer: C

Question 2

Which of the following algorithm is a lazy algorithm?

A.Random Forest

B.KNN

C.SVM

D.Linear Regression

Answer: B

Question 3

Which of the following algorithm is not a regression metric?

A.Accuracy

B.Recall

C.F1 Measure

D.All of the above

Answer: D

Exercise 6.2

Using the Diamonds dataset from the Seaborn library, train a regression algorithm of your choice, which predicts the price of the diamond. Perform all the preprocessing steps.

Solution:

1. import pandas as pd

2. import numpy as np

3. import seaborn as sns

5. diamonds_df = sns.load_dataset(“diamonds”)

7. X = diamonds_df.drop([‘price’], axis=1)

8. y = diamonds_df[“price”]

10. numerical = X.drop([‘cut’, ‘color’, ‘clarity’], axis = 1)

11.

12. categorical = X.filter([‘cut’, ‘color’, ‘clarity’])

13.

14. cat_numerical = pd.get_dummies(categorical,drop_first=True)

15.

16. X = pd.concat([numerical, cat_numerical], axis = 1)

17.

18. from sklearn.model_selection import train_test_split

19.

20. X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=0)

21.

22. from sklearn.preprocessing import StandardScaler

23. sc = StandardScaler()

24. X_train = sc.fit_transform(X_train)

25. X_test = sc.transform (X_test)

26.

27. from sklearn import svm

28. svm_reg = svm.SVR()

29. regressor = svm_reg.fit(X_train, y_train)

30. y_pred = regressor.predict(X_test)

31.

32.

33.

34. from sklearn import metrics

35.

36. print(‘Mean Absolute Error:’, metrics.mean_absolute_error(y_test, y_pred))

37. print(‘Mean Squared Error:’, metrics.mean_squared_error(y_test, y_pred))

38. print(‘Root Mean Squared Error:’, np.sqrt(metrics.mean_squared_error(y_test, y_pred)))

Exercise 7.1

Question 1

Among the following, which one is not an example of classification outputs?

A.True

B.Red

C.Male

D.None of the above

Answer: D

Question 2

Which of the following metrics is used for unbalanced classification datasets?

A.Accuracy

B.F1

C.Precision

D.Recall

Answer: C

Question 3

Which of the following function is used to convert categorical values to one-hot encoded numerical values?

A.pd.get_onehot()

B.pd.get_dummies()

C.pd.get_numeric()

D.All of the above

Answer: B

Exercise 7.2

Using the iris dataset from the Seaborn library, train a classification algorithm of your choice, which predicts the species of the iris plant. Perform all the preprocessing steps.

Solution:

1. import pandas as pd

2. import numpy as np

3. import seaborn as sns

5. iris_df = sns.load_dataset(“iris”)

7. iris_df.head()

9. X = iris_df.drop([‘species’], axis=1)

10. y = iris_df[“species”]

11.

12.

13. from sklearn.model_selection import train_test_split

14.

15. X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=0)

16.

17. from sklearn.preprocessing import StandardScaler

18. sc = StandardScaler()

19. X_train = sc.fit_transform(X_train)

20. X_test = sc.transform (X_test)

21.

22. from sklearn.ensemble import RandomForestClassifier

23. rf_clf = RandomForestClassifier(random_state=42, n_estimators=500)

24.

25. classifier = rf_clf.fit(X_train, y_train)

26.

27. y_pred = classifier.predict(X_test)

28.

29.

30. from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

31.

32. print(confusion_matrix(y_test,y_pred))

33. print(classification_report(y_test,y_pred))

34. print(accuracy_score(y_test, y_pred))

Exercise 8.1

Question 1

Which of the following is a supervised machine learning algorithm?

A.K Means Clustering

B.Hierarchical Clustering

C.All of the above

D.None of the above

Answer: D

Question 2

In KMeans clustering, what does the inertia tell us?

A.the distance between data points within a cluster

B.output labels for the data points

C.the number of clusters

D.None of the above

Answer: C

Question 3

In hierarchical clustering, in the case of vertical dendrograms, the number of clusters is equal to the number of ____ lines that the ____ line passes through?

A.horizontal, vertical

B.vertical, horizontal

C.None of the above

D.All of the above

Answer: B

Exercise 8.2

Apply KMeans clustering on the banknote.csv dataset available in the Datasets folder in the GitHub repository. Find the optimal number of clusters and then print the clustered dataset. The following script imports the dataset and prints the first five rows of the dataset.

1. banknote_df = pd.read_csv(r”E:Hands on Python for Data Science and Machine LearningDatasetsanknote.csv”)

2. banknote_df.head()

4. ### Solution:

6. # dividing data into features and labels

7. features = banknote_df.drop([“class”], axis = 1)

8. labels = banknote_df.filter([“class”], axis = 1)

9. features.head()

10.

11. # training KMeans on K values from 1 to 10

12. loss =[]

13. for i in range(1, 11):

14. km = KMeans(n_clusters = i).fit(features)

15. loss.append(km.inertia_)

16.

17. #printing loss against number of clusters

18.

19. import matplotlib.pyplot as plt

20. plt.plot(range(1, 11), loss)

21. plt.title(‘Finding Optimal Clusters via Elbow Method’)

22. plt.xlabel(‘Number of Clusters’)

23. plt.ylabel(‘loss’)

24. plt.show()

25.

26. # training KMeans with 3 clusters

27. features = features.values

28. km_model = KMeans(n_clusters=2)

29. km_model.fit(features)

30.

31. #pring the data points with prediced labels

32. plt.scatter(features[:,0], features[:,1], c= km_model.labels_, cmap=’rainbow’ )

33.

34. #print the predicted centroids

35. plt.scatter(km_model.cluster_centers_[:, 0], km_model.cluster_centers_[:, 1], s=100, c=’black’)

Exercise 9.1

Question 1

What should be the input shape of the input image to the convolutional neural network?

A.Width, Height

B.Height, Width

C.Channels, Width, Height

D.Width, Height, Channels

Answer: D

Question 2:

We say that a model is overfitting when:

A.Results on the test set are better than the results on the training set

B.Results on both test and training sets are similar

C.Results on the training set are better than the results on the test set

D.None of the above

Answer (C)

Question 3

The ReLu activation function is used to introduce:

A.Linearity

B.Non-linearity

C.Quadraticity

D.None of the above

Answer: B

Exercise 9.2

Using the CFAR 10 image dataset, perform image classification to recognize the image. Here is the dataset:

2. cifar_dataset = tf.keras.datasets.cifar10

Solution:

1. #importing required libraries

2. import numpy as np

3. import matplotlib.pyplot as plt

4. from tensorflow.keras.layers import Input, Conv2D, Dense, Flatten, Dropout, MaxPool2D

5. from tensorflow.keras.models import Model

8. (training_images, training_labels), (test_images, test_labels) = cifar_dataset.load_data()

10. training_images, test_images = training_images/255.0, test_images/255.0

11.

12. training_labels, test_labels = training_labels.flatten(), test_labels.flatten()

13. print(training_labels.shape)

14. print(training_images.shape)

15. output_classes = len(set(training_labels))

16. print(“Number of output classes is: “, output_classes)

17. input_layer = Input(shape = training_images[0].shape )

18. conv1 = Conv2D(32, (3,3), strides = 2, activation= ‘relu’) (input_layer)

19. maxpool1 = MaxPool2D(2, 2)(conv1)

20. conv2 = Conv2D(64, (3,3), strides = 2, activation= ‘relu’) (maxpool1)

21. #conv3 = Conv2D(128, (3,3), strides = 2, activation= ‘relu’)(conv2)

22. flat1 = Flatten()(conv2)

23. drop1 = Dropout(0.2)(flat1)

24. dense1 = Dense(512, activation = ‘relu’)(drop1)

25. drop2 = Dropout(0.2)(dense1)

26. output_layer = Dense(output_classes, activation= ‘softmax’)(drop2)

27.

28. model = Model(input_layer, output_layer)

29. model.compile(optimizer = ‘adam’, loss= ‘sparse_categorical_crossentropy’, metrics =[‘accuracy’])

30. model_history = model.fit(training_images, training_labels, epochs=20, validation_data=(test_images, test_labels), verbose=1)

Exercise 10.1

Question 1

Which of the following are the benefits of dimensionality reduction?

A.Data Visualization

B.Faster training time for statistical algorithms

C.All of the above

D.None of the above

Answer: C

Question 2

In PCA, dimensionality reduction depends upon the:

A.Feature set only

B.Label set only

C.Both features and labels sets

D.None of the above

Answer: A

Question 3

LDA is a ____ ? dimensionality reduction technique

A.Unsupervised

B.Semi-Supervised

C.Supervised

D.Reinforcement

Answer: C

Exercise 10.2

Apply principal component analysis for dimensionality reduction on the customer_churn.csv dataset from the Datasets folder in the GitHub repository. Print the accuracy using the two principal components. Also, plot the results on the test set using the two principal components.

Solution:

1. import pandas as pd

2. import numpy as np

4. churn_df = pd.read_csv(“E:Hands on Python for Data Science and Machine LearningDatasetscustomer_churn.csv”)

5. churn_df.head()

7. churn_df = churn_df.drop([‘RowNumber’, ‘CustomerId’, ‘Surname’], axis=1)

9. X = churn_df.drop([‘Exited’], axis=1)

10. y = churn_df[‘Exited’]

11.

12. numerical = X.drop([‘Geography’, ‘Gender’], axis = 1)

13. categorical = X.filter([‘Geography’, ‘Gender’])

14. cat_numerical = pd.get_dummies(categorical,drop_first=True)

15. X = pd.concat([numerical, cat_numerical], axis = 1)

16. X.head()

17.

18. from sklearn.model_selection import train_test_split

19.

20. X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=0)

21.

22. #applying scaling on training and test data

23. from sklearn.preprocessing import StandardScaler

24. sc = StandardScaler()

25. X_train = sc.fit_transform(X_train)

26. X_test = sc.transform (X_test)

27.

28. #importing PCA class

29. from sklearn.decomposition import PCA

30.

31. #creating object of the PCA class

32. pca = PCA()

33.

34. #training PCA model on training data

35. X_train = pca.fit_transform(X_train)

36.

37. #making predictions on test data

38. X_test = pca.transform(X_test)

39.

40. #printing variance ratios

41. variance_ratios = pca.explained_variance_ratio_

42. print(variance_ratios)

43.

44. #use one principal component

45. from sklearn.decomposition import PCA

46.

47. pca = PCA(n_components=2)

48. X_train = pca.fit_transform(X_train)

49. X_test = pca.transform(X_test)

50.

51. #making predictions using logistic regression

52. from sklearn.linear_model import LogisticRegression

53.

54. #training the logistic regression model

55. lg = LogisticRegression()

56. lg.fit(X_train, y_train)

57.

58.

59. # Predicting the Test set results

60. y_pred = lg.predict(X_test)

61.

62. #evaluating results

63.

64. from sklearn.metrics import accuracy_score

65.

66. print(accuracy_score(y_test, y_pred))

67.

68. from matplotlib import pyplot as plt

69. %matplotlib inline

70.

71. #print actual datapoints

72.

73. plt.scatter(X_test[:,0], X_test[:,1], c= y_test, cmap=’rainbow’ )

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Exercises Solutions

Create new playlist

Sign In

Sign Up

Exercise 2.1

Exercise 2.2.

Exercise 3.1

Exercise 3.2

Solution:

Exercise 4.1

Exercise 4.2

Solution:

Exercise 5.1

Exercise 5.2

Solution:

Output:

Exercise 6.1

Exercise 6.2

Solution:

Exercise 7.1

Exercise 7.2

Solution:

Exercise 8.1

Exercise 8.2

Exercise 9.1

Exercise 9.2

Exercise 10.1

Exercise 10.2

Solution:

Table of Contents for
Exercises Solutions