Elaborating a little longer

However, we could elaborate a little longer to get a more detailed summary that will also output the alternative hypothesis and the confidence interval. The following code block demonstrates how such a function can be created. Afterward, we will see how to keep this function close in order to easily call and use it later:

z.test <- function(sample, mu, sigma, conf.lvl = .95,
                   alternative = 'two.sided'){
  n <- length(sample)
  xbar <- mean(sample, na.rm =T)
  z <- (xbar - mu)/(sigma/sqrt(n))

  if(alternative == 'two.sided'){
    p.value <- 2*pnorm(-abs(z))
    alt <- 'not equal '
    err <- -qnorm((1-conf.lvl)/2)*sigma/sqrt(n)
    a <- xbar - err
    b <- xbar + err
  }
  else if(alternative == 'greater'){
    p.value <- pnorm(z, lower.tail = F)
    alt <- 'greater than '
    err <- qnorm(conf.lvl)*sigma/sqrt(n)
    a <- xbar - err
    b <- 'Inf'
  }
  else if(alternative == 'less'){
    p.value <- pnorm(z)
    alt <- 'less than '
    err <- qnorm(conf.lvl)*sigma/sqrt(n)
    a <- '-Inf'
    b <- xbar + err
  }
  else{stop('alternative is missepecified. Accepted values are',
            ' 'two.sided','greater' or 'less'
')}
  
  cat('z = ', z, ' obs. = ', n, ' p-value = ', p.value, '
')
  cat('alternative hypothesis: true mean is ', alt , mu, '
')
  cat(conf.lvl*100, ' percent confidence interval: 
')
  cat(a,' ',b, '
')
  cat('mean of x
', xbar, '
')
}

Just like the later designation of z.test(), this function is lengthy but works in a very modular way. The first module estimates the arithmetic mean from the sample, storing the number of observations and calculating the test statistic, z. The second module calculates the p-value and confidence interval with respect to the assigned alternative hypothesis, while the third module simply prints the summary—which looks a lot like the summary given by t.test().

The concept of the confidence interval was introduced by Jerzy Neyman in 1937. It's very useful in the decision-making process (and remarkably misinterpreted).

Once we have designed our function, we can make it easily available for later use by doing a few things. First, save this code into a separate .R file, preferably somewhere where the path to it will be very short or in your working directory. I named mine z_test.r and stored it in the C:/libr directory—a personal library of R functions. So, the path in my computer that will lead to this file will be 'C:/libr/z_test.r'. Now, I can use the source() function to run this file, thus making the z.test() function available at any time in the future:

source('C:/libr/z_test.r')

I would advise the reader to adopt a directory that requires little text to reach for these files. Intuitive names and logs are also advisable, to keep your personal library organized. It's also possible to group several functions into a single .R file (and run any kind of R codes from the console). There is, at least, one other way to work your personal library out and requires combining the source() and url() functions.

I find this second way pretty cool (I like to imagine that I'm constructing my own Arcana library in the clouds). It's mostly useful in order to keep a backup of your library or to make it remotely accessible (either for you or for other people). It consists basically of uploading your file through the cloud, sharing it, and accessing your file through a link; extra steps can make things better.

Another piece of advice: keep both cloud and flash-drive backups of your codes and other kinds of important stuff. Sharing or not will only be a matter of how sensible the content is. If you happen to create great stuff that can't be found elsewhere, kindly consider publishing it as a package trough CRAN and GitHub.

If you seek to do something like that, the following steps may work as a sort of guide:

If you want to see the magic happening, make sure to have internet access, remove z.test() from your environment with rm(z.test), and run the code coming next.

Upload the file to a cloud. I prefer to use Dropbox, but only because I know how to ask for the raw file using nothing but the URL (this might be also possible with other services, I just don't know how to do it).
Select the file from the cloud and share it through a link. For example, I have this link: https://www.dropbox.com/s/53l79kwklr9xnh3/z_test.R?dl=0. Add &raw=1 to its end in order to ask for the raw file (https://www.dropbox.com/s/53l79kwklr9xnh3/z_test.R?dl=0&raw=1).
To improve things a little bit, you can shorten the URL using an URL shortener. Using bit.do I got: http://bit.do/z_test.
Combine source() and url() to read the file from the cloud, like this:

source(url('http://bit.do/z_test'))

Now you can compare the summary given by z.test() and t.test():

t.test(big_sample, mu = 10)
z.test(big_sample, mu = 10, sigma = 5)

Notice that the p-value and confidence intervals are pretty close; that is explained by the sample size. Do not forget that z.test() will require a sigma parameter. The results are not so similar using the small sample:

t.test(small_sample, mu = 10)
z.test(small_sample, mu = 10, sigma = 5)

Given that these samples were generated from a normal distribution with mean equal 10 and a standard deviation of 5, z.test() was more precise for the small_sample test in comparison to t.test(). The latter came pretty close to rejecting the null hypothesis, with 95% confidence level. This is only due to us knowing exactly what the standard deviation was for sure. But what if the standard deviation was unknown? What outcomes should we expect by trying an unbiased estimator rather than the true sigma? Let's have a look:

z.test(small_sample, mu = 10 , sigma = sd(small_sample))
# z = -2.216899 obs. = 10 p-value = 0.02663 
# alternative hypothesis: true mean is not equal 10 
# 95 percent confidence interval: 
# 5.377763 9.715668 
# mean of x
#  7.546716

Bad ones, of course. The actually true null hypothesis was rejected with a 5% significance level, leading us towards a type I error. Bottom line: if you do know population standard deviation, you should use a z-test; if you don't, you're better off with a t-test. Although, so far, this chapter only quickly mentioned how these tests could be used in real-life applications, the truth is they are very general tests that fit well in a great variety of real-life problems. Depending on what is to be tested, they could be working in the background of an A/B test, for example.

Table of Contents for Elaborating a little longer

Create new playlist

Sign In

Sign Up

Table of Contents for
Elaborating a little longer