The use of neural networks for prediction requires the dependent/target/output variable to be numeric, and all the input/independent/feature variables can be of any type. From the ArtPiece
dataset, we are going to predict what is going to be the current auction average price based on all the parameters available. Before applying a neural-network-based model, it is important to preprocess the data, by excluding the missing values and any transformation if required; hence, let's preprocess the data:
library(neuralnet) art<- read.csv("ArtPiece_1.csv") str(art) #data conversion for categorical features art$Art.Auction.House<-as.factor(art$Art.Auction.House) art$IsGood.Purchase<-as.factor(art$IsGood.Purchase) art$Art.Category<-as.factor(art$Art.Category) art$Prominent.Color<-as.factor(art$Prominent.Color) art$Brush<-as.factor(art$Brush) art$Brush.Size<-as.factor(art$Brush.Size) art$Brush.Finesse<-as.factor(art$Brush.Finesse) art$Art.Nationality<-as.factor(art$Art.Nationality) art$Top.3.artists<-as.factor(art$Top.3.artists) art$GoodArt.check<-as.factor(art$GoodArt.check) art$AuctionHouseGuarantee<-as.factor(art$AuctionHouseGuarantee) art$Is.It.Online.Sale<-as.factor(art$Is.It.Online.Sale) #data conversion for numeric features art$Critic.Ratings<-as.numeric(art$Critic.Ratings) art$Acq.Cost<-as.numeric(art$Acq.Cost) art$CurrentAuctionAveragePrice<-as.numeric(art$CurrentAuctionAveragePrice) art$CollectorsAverageprice<-as.numeric(art$CollectorsAverageprice) art$Min.Guarantee.Cost<-as.numeric(art$Min.Guarantee.Cost) #removing NA, Missing values from the data fun1<-function(x){ ifelse(x=="#VALUE!",NA,x) } art<-as.data.frame(apply(art,2,fun1)) art<-na.omit(art) #keeping only relevant variables for prediction art<-art[,c("Art.Auction.House","IsGood.Purchase","Art.Category", "Prominent.Color","Brush","Brush.Size","Brush.Finesse", "Art.Nationality","Top.3.artists","GoodArt.check", "AuctionHouseGuarantee","Is.It.Online.Sale","Critic.Ratings", "Acq.Cost","CurrentAuctionAveragePrice","CollectorsAverageprice", "Min.Guarantee.Cost")] #creating dummy variables for the categorical variables library(dummy) art_dummy<-dummy(art[,c("Art.Auction.House","IsGood.Purchase","Art.Category", "Prominent.Color","Brush","Brush.Size","Brush.Finesse", "Art.Nationality","Top.3.artists","GoodArt.check", "AuctionHouseGuarantee","Is.It.Online.Sale")],int=F) art_num<-art[,c("Critic.Ratings", "Acq.Cost","CurrentAuctionAveragePrice","CollectorsAverageprice", "Min.Guarantee.Cost")] art<-cbind(art_num,art_dummy) ## 70% of the sample size smp_size <- floor(0.70 * nrow(art)) ## set the seed to make your partition reproductible set.seed(123) train_ind <- sample(seq_len(nrow(art)), size = smp_size) train <- art[train_ind, ] test <- art[-train_ind, ] fun2<-function(x){ as.numeric(x) } train<-as.data.frame(apply(train,2,fun2)) test<-as.data.frame(apply(test,2,fun2))
In the training dataset, there are 50,867 observations and 17 variables, and in the test dataset, there are 21,801 observations and 17 variables. The current auction average price is the dependent variable for prediction, using only four other numeric variables as features:
>fit<- neuralnet(formula = CurrentAuctionAveragePrice ~ Critic.Ratings + Acq.Cost + CollectorsAverageprice + Min.Guarantee.Cost, data = train, hidden = 15, err.fct = "sse", linear.output = F) > fit Call: neuralnet(formula = CurrentAuctionAveragePrice ~ Critic.Ratings + Acq.Cost + CollectorsAverageprice + Min.Guarantee.Cost, data = train, hidden = 15, err.fct = "sse", linear.output = F) 1 repetition was calculated. Error Reached Threshold Steps 1 54179625353167 0.004727494957 23
A summary of the main results of the model is provided by result.matrix
. A snapshot of the result.matrix is given as follows:
> fit$result.matrix 1 error 54179625353167.000000000000 reached.threshold 0.004727494957 steps 23.000000000000 Intercept.to.1layhid1 -0.100084491816 Critic.Ratings.to.1layhid1 0.686332945444 Acq.Cost.to.1layhid1 0.196864454378 CollectorsAverageprice.to.1layhid1 -0.793174429352 Min.Guarantee.Cost.to.1layhid1 0.528046199494 Intercept.to.1layhid2 0.973616842194 Critic.Ratings.to.1layhid2 0.839826678316 Acq.Cost.to.1layhid2 0.077798897157 CollectorsAverageprice.to.1layhid2 0.988149246218 Min.Guarantee.Cost.to.1layhid2 -0.385031389636 Intercept.to.1layhid3 -0.008367359937 Critic.Ratings.to.1layhid3 -1.409715725621 Acq.Cost.to.1layhid3 -0.384200569485 CollectorsAverageprice.to.1layhid3 -1.019243809714 Min.Guarantee.Cost.to.1layhid3 0.699876747202 Intercept.to.1layhid4 2.085203047278 Critic.Ratings.to.1layhid4 0.406934874266 Acq.Cost.to.1layhid4 1.121189503896 CollectorsAverageprice.to.1layhid4 1.405748076570 Min.Guarantee.Cost.to.1layhid4 -1.043884892202 Intercept.to.1layhid5 0.862634752109 Critic.Ratings.to.1layhid5 0.814364667751 Acq.Cost.to.1layhid5 0.502879862694
If the error
function is equal to the negative log likelihood function, the error refers to the likelihood as it is used to calculate the Akaike Information Criterion (AIC). We can store the covariate and response data in a matrix:
> output<-cbind(fit$covariate,fit$result.matrix[[1]]) > head(output) [,1] [,2] [,3] [,4] [,5] [1,] 14953 49000 10727 5775 54179625353167 [2,] 35735 38850 9494 12418 54179625353167 [3,] 34751 43750 8738 9611 54179625353167 [4,] 31599 41615 5955 4158 54179625353167 [5,] 10437 34755 8390 4697 54179625353167 [6,] 13177 54670 13024 11921 54179625353167
To compare the results of a neural network model, we can use different tuning factors such as changing the algorithm, hidden layer, and learning rate. As an example, only four numeric features were used to generate the prediction; we could have used all the 91 features for prediction of the current auction average price variable. We can also use a different algorithm from the nnet
library, as follows:
> fit<-nnet(CurrentAuctionAveragePrice~Critic.Ratings+Acq.Cost+ + CollectorsAverageprice+Min.Guarantee.Cost,data=train, + size=100) # weights: 601 initial value 108359809492660.125000 final value 108359250706334.000000 converged > fit a 4-100-1 network with 601 weights inputs: Critic.Ratings Acq.Cost CollectorsAverageprice Min.Guarantee.Cost output(s): CurrentAuctionAveragePrice options were -
Both the libraries provide equal results; there is no difference in the model result, but to tune the results further, it is important to look at the model tuning parameters such as learning rate, hidden neurons, and so on. The following graph shows the neural network architecture:
The model for predicting the unseen data points can be implemented using the compute
function available in the neuralnet
library, and the predict
function available in the nnet
library.
18.227.72.15