The empirical Cumulative Distribution Function (CDF) is the non-parametric maximum-likelihood estimation of the CDF. In this recipe, we will see how the empirical CDF can be produced.
To produce this plot, we need to use the latticeExtra
library. We will use the simulated dataset as shown in the following code:
# Set a seed value to make the data reproducible set.seed(12345) qqdata <-data.frame(disA=rnorm(n=100,mean=20,sd=3), disB=rnorm(n=100,mean=25,sd=4), disC=rnorm(n=100,mean=15,sd=1.5), age=sample((c(1,2,3,4)),size=100,replace=T), sex=sample(c("Male","Female"),size=100,replace=T), econ_status=sample(c("Poor","Middle","Rich"), size=100,replace=T))
To plot an empirical CDF, we first need to call the latticeExtra
library (note that this library has a dependency on RColorBrewer
). Now, to plot the empirical CDF, we can use the following simple code:
library(latticeExtra) ecdfplot(~disA|sex,data=qqdata)
The basic structure of the ecdfplot()
function is a formula that specifies the variable to be plotted and the data argument. If we want to replicate the plot within another variable's group, then we have to specify the name of variable after the vertical bar ( |
). To plot more than one variable, we can add the variable with a plus sign, for example, ~disA+disB
.
In the ecdfplot()
function, there are other arguments that are also applicable to most of the functions in the lattice
library. One special feature of this function is the subset argument. If we want to produce the plot with a subset of the data for the specified variable, then we can utilize the subset argument with a conditional statement:
ecdfplot(~disA,data=qqdata,subset=disA>15)
18.116.42.136