Distribution fitting

In Timothy Sturm's example, we claim that the histogram of some data seemed to fit a normal distribution. SciPy has a few routines to help us approximate the best distribution to a random variable, together with the parameters that best approximate this fit. For example, for the data in that problem, the mean and standard deviation of the normal distribution that realizes the best fit can be found in the following way:

>>> from scipy.stats import norm     # Gaussian distribution
>>> mean,std=norm.fit(dataDiff)

We can now plot the (normed) histogram of the data, together with the computed probability density function, as follows:

>>> plt.hist(dataDiff, normed=1)
>>> x=numpy.linspace(dataDiff.min(),dataDiff.max(),1000)
>>> pdf=norm.pdf(x,mean,std)
>>> plt.plot(x,pdf)
>>> plt.show()

We will obtain the following graph showing the maximum likelihood estimate to the normal distribution that best fits dataDiff:

Distribution fitting

We may even fit the best probability density function without specifying any particular distribution, thanks to a non-parametric technique, kernel density estimation. We can find an algorithm to perform Gaussian kernel density estimation in the scipy.stats.kde submodule. Let us show by example with the same data as before:

>>> from scipy.stats import gaussian_kde
>>> pdf=gaussian_kde(dataDiff)

A slightly different plotting session as given before, offers us the following graph, showing probability density function obtained by kernel density estimation on dataDiff:

Distribution fitting

The full piece of code is as follows:

>>> from scipy.stats import gaussian_kde
>>> pdf = gaussian_kde(dataDiff)
>>> pdf = pdf.evaluate(x)
>>> plt.hist(dataDiff, normed=1)
>>> plt.plot(x,pdf,'k')
>>> plt.savefig("hist2.png")
>>> plt.show()

For comparative purposes, the last two plots can be combined into one:

>>> plt.hist(dataDiff, normed=1)
>>> plt.plot(x,pdf,'k.-',label='Kernel fit')
>>> plt.plot(x,norm.pdf(x,mean,std),'r',label='Normal fit')
>>> plt.legend() 
>>> plt.savefig("hist3.png")
>>> plt.show()

The output is the combined plot as follows:

Distribution fitting
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.15.129.90