Visualizing multivariate data with a heatmap

A heatmap is a useful visualization method to illustrate multivariate data when there are many variables to compare, such as in a big data analysis. It is a plot that displays values in a color scale in a grid. It is among the most common plots utilized by bioinformaticians to display hundreds or thousands of gene expression values in one plot.

With Seaborn, drawing a heatmap is just one line away from importing the library. It is done by calling sns.heatmap(df), where df is the Pandas DataFrame to be plotted. We can supply the cmap parameter to specify the color scale ("colormap") to be used. You can revisit the previous chapter for more details on colormap usage.

To get a feel for heatmap, in the following example, we demonstrate the usage with the specification of the 7th and 8th generations of Intel Core CPUs, which involves dozens of models and four chosen metrics. Before looking at the plotting code, let's look at the structure of the Pandas DataFrame that stores the data:

# Data obtained from https://ark.intel.com/#@Processors
import pandas as pd

cpuspec = pd.read_csv('intel-cpu-7+8.csv').set_index('Name')
print(cpuspec.info())
cpuspec.head()

From the following screen capture of the output, we see that we simply put the labels as the index and different properties in each column:

Notice that there are 16 models that do not support boosting without the Max Frequency property value. It makes sense to consider the Base Frequency as the maximum for our purpose here. We will fill in the NA values with the 'Max Frequency' by the corresponding 'Base Frequency':

cpuspec['Max Frequency'] = cpuspec['Max Frequency'].fillna(cpuspec['Base Frequency'])

Now, let's draw the heatmap with the following code:

import matplotlib.pyplot as plt
import seaborn as sns

plt.figure(figsize=(13,13))
sns.heatmap(cpuspec.drop(['Gen'],axis=1),cmap='Blues')
plt.xticks(fontsize=16)
plt.show()

Simple, isn't it? Only one line of code actually draws the heatmap. This is also an example of how we can use basic Matplotlib code to adjust other fine details of the plot, such as figure dimensions and the xticks font size in this case. Let's look at the result:

From the figure, even if we have absolutely no idea about these CPU models, we can easily infer something from the darker colors at the top among the i7 models. They are designed for higher performance with more core and cache space.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.220.55.43