Understanding logarithmic plots

More often than not, while reading daily newspapers and similar articles, one can find charts that are used by media organizations to misrepresent the facts. One usual example is using linear scales to create, so called, panic charts where constantly growing value is followed for long period of time (years) and starting values are smaller from latest one by several magnitudes. These values when visualized correctly, would (and usually should), produce linear or almost linear charts. This takes some panic out of the articles they illustrate.

Getting ready

With the logarithmic scale, the ratio of consecutive values is constant. This is important when we are trying to read log plots. With linear (arithmetic) scales, the constant is the distance between consecutive values. In other words, logarithmic plots have constant distance in orders of magnitude. We will see this illustrated on the following plots. The code used to produce this figure is explained here.

As a general rule of thumb, logarithmic scales should be used when the data presented has the following:

  • values that span several orders of magnitude
  • skewness toward large values (some points are much larger than the rest of the data)
  • you want to show the rate of change (growth rate) and not value of change

Don't blindly follow these rules, they are more like hints than rules. Always use your own judgment about the data in hand and requirements presented to you by the project or customer.

Depending on the data range, different log bases should be used. The standard base for the log is 10, but if the range of the data is smaller, a base of 2 can prove to be more useful as it will show more "resolution" within the smaller range.

If we have the range of data suitable for display on logarithmic scales, we will note that the values previously being too close to judge any difference are now well apart. This allows us to read the chart much easily than if we would present the data in linear scale.

The growth rate charts, where long-range time series data is collected, are where we want to see, not the absolute value measured at time point, but the growth in time. We will still get the absolute value information, but that information is of lower priority.

Also, if the data distribution has positive skew (for example, salaries), taking the logarithm of the value (salary) will help us fit the data into the model, as the logarithm transformation will give us more normal data distribution.

How to do it...

We will exemplify this with a sample code that shows the same two dataset (one linear and one logarithmic in nature) on two different plots (in the same figure) using different scales (linear and logarithmic).

We will be performing the following steps with the help of the code mentioned after the steps:

  1. Generate two simple datasets, y—exponential/logarithmic in nature, and z—linear in nature.
  2. Create figure containing grid of four subplots.
  3. Create two subplots containing the y dataset one in logarithmic scale and one in linear scale.
  4. Create another two subplots containing z dataset, again, one logarithmic and the other linear.

Here is the code:

from matplotlib import pyplot as plt
import numpy as np

x = np.linspace(1, 10)
y = [10 ** el for el in x]
z = [2 * el for el in x]

fig = plt.figure(figsize=(10, 8))

ax1 = fig.add_subplot(2, 2, 1)
ax1.plot(x, y, color='blue')
ax1.set_yscale('log')
ax1.set_title(r'Logarithmic plot of $ {10}^{x} $ ')
ax1.set_ylabel(r'$ {y} = {10}^{x} $')
plt.grid(b=True, which='both', axis='both')


ax2 = fig.add_subplot(2, 2, 2)
ax2.plot(x, y, color='red')
ax2.set_yscale('linear')
ax2.set_title(r'Linear plot of $ {10}^{x} $ ')
ax2.set_ylabel(r'$ {y} = {10}^{x} $')
plt.grid(b=True, which='both', axis='both')


ax3 = fig.add_subplot(2, 2, 3)
ax3.plot(x, z, color='green')
ax3.set_yscale('log')
ax3.set_title(r'Logarithmic plot of $ {2}*{x} $ ')
ax3.set_ylabel(r'$ {y} = {2}*{x} $')
plt.grid(b=True, which='both', axis='both')

ax4 = fig.add_subplot(2, 2, 4)
ax4.plot(x, z, color='magenta')
ax4.set_yscale('linear')
ax4.set_title(r'Linear plot of $ {2}*{x} $ ')
ax4.set_ylabel(r'$ {y} = {2}*{x} $')
plt.grid(b=True, which='both', axis='both')


plt.show()

This code will produce the following figure:

How to do it...

How it works...

We generate some sample data and two dependent variables—y and z. Variable y is expressed as exponential function of data (x), and variable z is simple linear function of x. This helps us illustrate different looks of linear and exponential charts.

We then create grid of four subplots, where the top row subplots are of data (x, y) and bottom row are of data (x, z) pairs.

Looking from left-hand side, columns charts have logarithmic scales on the y-axis, while right-hand side columns are in linear scale. We set this using set_yscale('log') for each axis separately.

For every subplot, we set a title and label, where label also describes the function plotted.

With plt.grid(b=True, which='both', axis='both'), we turn the grid on for both axis and both the major and minor ticks.

We observe how linear functions are straight lines on linear plots, while logarithmic functions are straight lines on logarithmic plots.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.232.189