Exercises Solutions
Question 1
Which iteration should be used when you want to repeatedly execute a code for a specific number of times?
A.For Loop
B.While Loop
C.Both A and B
D.None of the above
Answer: A
Question 2
What is the maximum number of values that a function can return in Python?
A.Single Value
B.Double Value
C.More than two values
D.None
Answer: C
Question 3
Which of the following membership operators are supported by Python?
A.In
B.Out
C.Not In
D.Both A and C
Answer: D
Print the table of integer 9 using a while loop:
1. j=1
2. while j< 11:
3. print(“9 x “+str(j)+ “ = “+ str(9*j))
4. j=j+1
Question 1:
Which NumPy function is used for the element-wise multiplication of two matrices?
A.np.dot(matrix1, matrix2)
B.np.multiply(matrix1, matrix2)
C.np.elementwise(matrix1, matrix2)
D.None of the above
Answer: B
Question 2:
To generate an identity matrix of four rows and four columns, which of the following functions can be used?
A.np.identity(4,4)
B.np.id(4,4)
C.np.eye(4,4)
D.All of the above
Answer: C
Question 3:
How to create the array of numbers 4,7,10,13,16 with NumPy:
A.np.arange(3, 16, 3)
B.np.arange(4, 16, 3)
C.np.arange(4, 15,3)
D.None of the above
Answer: D
Create a random NumPy array of five rows and four columns. Using array indexing and slicing, display the items from row three to end and column two to end.
1. uniform_random = np.random.rand(4, 5)
2. print(uniform_random)
3. print(“Result”)
4. print(uniform_random[2:,3:])
Question 1
In order to horizontally concatenate two Pandas dataframes, the value for the axis attribute should be set to:
A.0
B.1
C.2
D.None of the above
Answer: B
Question 2
Which function is used to sort the Pandas dataframe by a column value?
A.sort_dataframe()
B.sort_rows()
C.sort_values()
D.sort_records()
Answer: C
Question 3
To filter columns from a Pandas dataframe, you have to pass a list of column names to one of the following method:
A.filter()
B.filter_columns()
C.apply_filter ()
D.None of the above()
Answer: A
Use the apply function to subtract 10 from the Fare column of the Titanic dataset, without using the lambda expression.
1. def subt(x):
2. return x - 10
3.
4. updated_class = titanic_data.Fare.apply(subt)
5. updated_class.head()
Question 1
Which Pandas function is used to plot a horizontal bar plot:
A.horz_bar()
B.barh()
C.bar_horizontal()
D.horizontal_bar()
Answer: B
Question 2:
To create a legend, the value for which of the following parameters is needed to be specified?
A.title
B.label
C.axis
D.All of the above
Answer: B
Question 3:
How to show percentage values on a Matplotlib Pie Chart?
A.autopct = ‘%1.1f%%’
B.percentage = ‘%1.1f%%’
C.perc = ‘%1.1f%%’
D.None of the Above
Answer: A
Plot two scatter plots on the same graph using the tips_dataset. In the first scatter plot, display values from the total_bill column on the x-axis and from the tip column on the y-axis. The color of the first scatter plot should be green. In the second scatter plot, display values from the total_bill column on the x-axis and from the size column on the y-axis. The color of the second scatter plot should be blue, and the markers should be x.
1. sns.scatterplot(x=”total_bill”, y=”tip”, data=tips_data, color = ‘g’)
2. sns.scatterplot(x=”total_bill”, y=”size”, data=tips_data, color = ‘b’, marker = ‘x’)
Question 1
Among the following, which one is an example of a regression output?
A.True
B.Red
C.2.5
D.None of the above
Answer: C
Question 2
Which of the following algorithm is a lazy algorithm?
A.Random Forest
B.KNN
C.SVM
D.Linear Regression
Answer: B
Question 3
Which of the following algorithm is not a regression metric?
A.Accuracy
B.Recall
C.F1 Measure
D.All of the above
Answer: D
Using the Diamonds dataset from the Seaborn library, train a regression algorithm of your choice, which predicts the price of the diamond. Perform all the preprocessing steps.
1. import pandas as pd
2. import numpy as np
3. import seaborn as sns
4.
5. diamonds_df = sns.load_dataset(“diamonds”)
6.
7. X = diamonds_df.drop([‘price’], axis=1)
8. y = diamonds_df[“price”]
9.
10. numerical = X.drop([‘cut’, ‘color’, ‘clarity’], axis = 1)
11.
12. categorical = X.filter([‘cut’, ‘color’, ‘clarity’])
13.
14. cat_numerical = pd.get_dummies(categorical,drop_first=True)
15.
16. X = pd.concat([numerical, cat_numerical], axis = 1)
17.
18. from sklearn.model_selection import train_test_split
19.
20. X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=0)
21.
22. from sklearn.preprocessing import StandardScaler
23. sc = StandardScaler()
24. X_train = sc.fit_transform(X_train)
25. X_test = sc.transform (X_test)
26.
27. from sklearn import svm
28. svm_reg = svm.SVR()
29. regressor = svm_reg.fit(X_train, y_train)
30. y_pred = regressor.predict(X_test)
31.
32.
33.
34. from sklearn import metrics
35.
36. print(‘Mean Absolute Error:’, metrics.mean_absolute_error(y_test, y_pred))
37. print(‘Mean Squared Error:’, metrics.mean_squared_error(y_test, y_pred))
38. print(‘Root Mean Squared Error:’, np.sqrt(metrics.mean_squared_error(y_test, y_pred)))
Question 1
Among the following, which one is not an example of classification outputs?
A.True
B.Red
C.Male
D.None of the above
Answer: D
Question 2
Which of the following metrics is used for unbalanced classification datasets?
A.Accuracy
B.F1
C.Precision
D.Recall
Answer: C
Question 3
Which of the following function is used to convert categorical values to one-hot encoded numerical values?
A.pd.get_onehot()
B.pd.get_dummies()
C.pd.get_numeric()
D.All of the above
Answer: B
Using the iris dataset from the Seaborn library, train a classification algorithm of your choice, which predicts the species of the iris plant. Perform all the preprocessing steps.
1. import pandas as pd
2. import numpy as np
3. import seaborn as sns
4.
5. iris_df = sns.load_dataset(“iris”)
6.
7. iris_df.head()
8.
9. X = iris_df.drop([‘species’], axis=1)
10. y = iris_df[“species”]
11.
12.
13. from sklearn.model_selection import train_test_split
14.
15. X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=0)
16.
17. from sklearn.preprocessing import StandardScaler
18. sc = StandardScaler()
19. X_train = sc.fit_transform(X_train)
20. X_test = sc.transform (X_test)
21.
22. from sklearn.ensemble import RandomForestClassifier
23. rf_clf = RandomForestClassifier(random_state=42, n_estimators=500)
24.
25. classifier = rf_clf.fit(X_train, y_train)
26.
27. y_pred = classifier.predict(X_test)
28.
29.
30. from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
31.
32. print(confusion_matrix(y_test,y_pred))
33. print(classification_report(y_test,y_pred))
34. print(accuracy_score(y_test, y_pred))
Question 1
Which of the following is a supervised machine learning algorithm?
A.K Means Clustering
B.Hierarchical Clustering
C.All of the above
D.None of the above
Answer: D
Question 2
In KMeans clustering, what does the inertia tell us?
A.the distance between data points within a cluster
B.output labels for the data points
C.the number of clusters
D.None of the above
Answer: C
Question 3
In hierarchical clustering, in the case of vertical dendrograms, the number of clusters is equal to the number of ____ lines that the ____ line passes through?
A.horizontal, vertical
B.vertical, horizontal
C.None of the above
D.All of the above
Answer: B
Apply KMeans clustering on the banknote.csv dataset available in the Datasets folder in the GitHub repository. Find the optimal number of clusters and then print the clustered dataset. The following script imports the dataset and prints the first five rows of the dataset.
1. banknote_df = pd.read_csv(r”E:Hands on Python for Data Science and Machine LearningDatasetsanknote.csv”)
2. banknote_df.head()
3.
4. ### Solution:
5.
6. # dividing data into features and labels
7. features = banknote_df.drop([“class”], axis = 1)
8. labels = banknote_df.filter([“class”], axis = 1)
9. features.head()
10.
11. # training KMeans on K values from 1 to 10
12. loss =[]
13. for i in range(1, 11):
14. km = KMeans(n_clusters = i).fit(features)
15. loss.append(km.inertia_)
16.
17. #printing loss against number of clusters
18.
19. import matplotlib.pyplot as plt
20. plt.plot(range(1, 11), loss)
21. plt.title(‘Finding Optimal Clusters via Elbow Method’)
22. plt.xlabel(‘Number of Clusters’)
23. plt.ylabel(‘loss’)
24. plt.show()
25.
26. # training KMeans with 3 clusters
27. features = features.values
28. km_model = KMeans(n_clusters=2)
29. km_model.fit(features)
30.
31. #pring the data points with prediced labels
32. plt.scatter(features[:,0], features[:,1], c= km_model.labels_, cmap=’rainbow’ )
33.
34. #print the predicted centroids
35. plt.scatter(km_model.cluster_centers_[:, 0], km_model.cluster_centers_[:, 1], s=100, c=’black’)
Question 1
What should be the input shape of the input image to the convolutional neural network?
A.Width, Height
B.Height, Width
C.Channels, Width, Height
D.Width, Height, Channels
Answer: D
Question 2:
We say that a model is overfitting when:
A.Results on the test set are better than the results on the training set
B.Results on both test and training sets are similar
C.Results on the training set are better than the results on the test set
D.None of the above
Answer (C)
Question 3
The ReLu activation function is used to introduce:
A.Linearity
B.Non-linearity
C.Quadraticity
D.None of the above
Answer: B
Using the CFAR 10 image dataset, perform image classification to recognize the image. Here is the dataset:
2. cifar_dataset = tf.keras.datasets.cifar10
Solution:
1. #importing required libraries
2. import numpy as np
3. import matplotlib.pyplot as plt
4. from tensorflow.keras.layers import Input, Conv2D, Dense, Flatten, Dropout, MaxPool2D
5. from tensorflow.keras.models import Model
6.
7.
8. (training_images, training_labels), (test_images, test_labels) = cifar_dataset.load_data()
9.
10. training_images, test_images = training_images/255.0, test_images/255.0
11.
12. training_labels, test_labels = training_labels.flatten(), test_labels.flatten()
13. print(training_labels.shape)
14. print(training_images.shape)
15. output_classes = len(set(training_labels))
16. print(“Number of output classes is: “, output_classes)
17. input_layer = Input(shape = training_images[0].shape )
18. conv1 = Conv2D(32, (3,3), strides = 2, activation= ‘relu’) (input_layer)
19. maxpool1 = MaxPool2D(2, 2)(conv1)
20. conv2 = Conv2D(64, (3,3), strides = 2, activation= ‘relu’) (maxpool1)
21. #conv3 = Conv2D(128, (3,3), strides = 2, activation= ‘relu’)(conv2)
22. flat1 = Flatten()(conv2)
23. drop1 = Dropout(0.2)(flat1)
24. dense1 = Dense(512, activation = ‘relu’)(drop1)
25. drop2 = Dropout(0.2)(dense1)
26. output_layer = Dense(output_classes, activation= ‘softmax’)(drop2)
27.
28. model = Model(input_layer, output_layer)
29. model.compile(optimizer = ‘adam’, loss= ‘sparse_categorical_crossentropy’, metrics =[‘accuracy’])
30. model_history = model.fit(training_images, training_labels, epochs=20, validation_data=(test_images, test_labels), verbose=1)
Question 1
Which of the following are the benefits of dimensionality reduction?
A.Data Visualization
B.Faster training time for statistical algorithms
C.All of the above
D.None of the above
Answer: C
Question 2
In PCA, dimensionality reduction depends upon the:
A.Feature set only
B.Label set only
C.Both features and labels sets
D.None of the above
Answer: A
Question 3
LDA is a ____ ? dimensionality reduction technique
A.Unsupervised
B.Semi-Supervised
C.Supervised
D.Reinforcement
Answer: C
Apply principal component analysis for dimensionality reduction on the customer_churn.csv dataset from the Datasets folder in the GitHub repository. Print the accuracy using the two principal components. Also, plot the results on the test set using the two principal components.
1. import pandas as pd
2. import numpy as np
3.
4. churn_df = pd.read_csv(“E:Hands on Python for Data Science and Machine LearningDatasetscustomer_churn.csv”)
5. churn_df.head()
6.
7. churn_df = churn_df.drop([‘RowNumber’, ‘CustomerId’, ‘Surname’], axis=1)
8.
9. X = churn_df.drop([‘Exited’], axis=1)
10. y = churn_df[‘Exited’]
11.
12. numerical = X.drop([‘Geography’, ‘Gender’], axis = 1)
13. categorical = X.filter([‘Geography’, ‘Gender’])
14. cat_numerical = pd.get_dummies(categorical,drop_first=True)
15. X = pd.concat([numerical, cat_numerical], axis = 1)
16. X.head()
17.
18. from sklearn.model_selection import train_test_split
19.
20. X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=0)
21.
22. #applying scaling on training and test data
23. from sklearn.preprocessing import StandardScaler
24. sc = StandardScaler()
25. X_train = sc.fit_transform(X_train)
26. X_test = sc.transform (X_test)
27.
28. #importing PCA class
29. from sklearn.decomposition import PCA
30.
31. #creating object of the PCA class
32. pca = PCA()
33.
34. #training PCA model on training data
35. X_train = pca.fit_transform(X_train)
36.
37. #making predictions on test data
38. X_test = pca.transform(X_test)
39.
40. #printing variance ratios
41. variance_ratios = pca.explained_variance_ratio_
42. print(variance_ratios)
43.
44. #use one principal component
45. from sklearn.decomposition import PCA
46.
47. pca = PCA(n_components=2)
48. X_train = pca.fit_transform(X_train)
49. X_test = pca.transform(X_test)
50.
51. #making predictions using logistic regression
52. from sklearn.linear_model import LogisticRegression
53.
54. #training the logistic regression model
55. lg = LogisticRegression()
56. lg.fit(X_train, y_train)
57.
58.
59. # Predicting the Test set results
60. y_pred = lg.predict(X_test)
61.
62. #evaluating results
63.
64. from sklearn.metrics import accuracy_score
65.
66. print(accuracy_score(y_test, y_pred))
67.
68. from matplotlib import pyplot as plt
69. %matplotlib inline
70.
71. #print actual datapoints
72.
73. plt.scatter(X_test[:,0], X_test[:,1], c= y_test, cmap=’rainbow’ )
3.21.158.148