In Timothy Sturm's example, we claim that the histogram of some data seemed to fit a normal distribution. SciPy has a few routines to help us approximate the best distribution to a random variable, together with the parameters that best approximate this fit. For example, for the data in that problem, the mean and standard deviation of the normal distribution that realizes the best fit can be found in the following way:
>>> from scipy.stats import norm # Gaussian distribution >>> mean,std=norm.fit(dataDiff)
We can now plot the (normed
) histogram of the data, together with the computed probability density function, as follows:
>>> plt.hist(dataDiff, normed=1) >>> x=numpy.linspace(dataDiff.min(),dataDiff.max(),1000) >>> pdf=norm.pdf(x,mean,std) >>> plt.plot(x,pdf) >>> plt.show()
We will obtain the following graph showing the maximum likelihood estimate to the normal distribution that best fits dataDiff
:
We may even fit the best probability density function without specifying any particular distribution, thanks to a non-parametric technique, kernel density estimation. We can find an algorithm to perform Gaussian kernel density estimation in the scipy.stats.kde
submodule. Let us show by example with the same data as before:
>>> from scipy.stats import gaussian_kde >>> pdf=gaussian_kde(dataDiff)
A slightly different plotting session as given before, offers us the following graph, showing probability density function obtained by kernel density estimation on dataDiff
:
The full piece of code is as follows:
>>> from scipy.stats import gaussian_kde >>> pdf = gaussian_kde(dataDiff) >>> pdf = pdf.evaluate(x) >>> plt.hist(dataDiff, normed=1) >>> plt.plot(x,pdf,'k') >>> plt.savefig("hist2.png") >>> plt.show()
For comparative purposes, the last two plots can be combined into one:
>>> plt.hist(dataDiff, normed=1) >>> plt.plot(x,pdf,'k.-',label='Kernel fit') >>> plt.plot(x,norm.pdf(x,mean,std),'r',label='Normal fit') >>> plt.legend() >>> plt.savefig("hist3.png") >>> plt.show()
3.149.213.44