How to do it...

In this example, we have a variable that flags if a client has bought a special gift pack. For each client, we have several attributes, such as the age of the customer, how much the customer has already bought, and how long they have been a client:

We first load rstan and the dataset as follows:

library(rstan) 
data   = read.csv("./clients.csv")

We formulate our model. In this case, we put loose normal priors on each coefficient. The important part is the bernoulli_logit line, which returns either a zero or a one:

model ="
data {
int<lower = 0, upper = 1> y[99];
real x[99,3];
}
parameters {
real beta[3];
real alpha;
}
model {
beta ~ normal(5,10);
for (n in 1:99)
y[n] ~ bernoulli_logit(alpha + beta[1]*x[n,1] + beta[2]*x[n,2] + beta[3]*x[n,3]);
}"

We pass two vectors for prediction: firstly, someone young (20 years old), zero products bought before, and 1 month as a customer; and secondly, someone old (60 years old), five products bought before, and eight months as a customer. We should expect to see a low probability of buying for the first case, and a high probability for the second one:

topredict = rbind(c(20,0,1),c(60,5,8)) 
data   = read.csv("./clients.csv") 
xy     = list(y=data[,1],x=data[,2:4],ns=topredict)  
fit    = stan(model_code = model, data = xy, warmup =
500, iter = 5000, chains = 4, cores = 2, thin = 
1,verbose=FALSE)

We can do a quick check on the quality of the convergence (we pass the parameters that we want to get the traceplot for, using the pars= argument):

rstan::traceplot(fit,pars=c("beta[1]","beta[2]","beta[3]","alpha"))

The following screenshot shows how the traceplot looks:

We can finally summarize our results and plot the posterior densities. Since we passed two vectors for making predictions, we can view the results here. We would expect that the probability for the second vector (someone older that has already bought several products, and has been a client for a while) should be close to 1. On the other hand, we should expect to see a probability equal to 0 for the first vector (someone young, who hasn't bought products before). We can confirm this by looking at the next summary:

summary(fit)

Take a look at the following screenshot:

The following screenshot shows posterior densities. The posterior density for the first prediction is basically concentrated in 0, and for the second case it is concentrated around 1:

stan_dens(fit)

Take a look at the following screenshot:

Table of Contents for How to do it...

Create new playlist

Sign In

Sign Up

Table of Contents for
How to do it...