Difference between training and inference

The difference between training and inference is similar to that of a student being taught something like algebra at school and then applying it in the real world. In school, the student is given numerous exercises; for each exercise, the student attempts the question and hands his/her answer over to the teacher, who provides feedback indicating whether it is correct or not. Initially, this feedback is likely to be skewed toward the student being wrong more often than right, but after many attempts, as the student starts building his/her understanding of the concepts, the feedback shifts towards mostly being right. At this point, the student is considered to have sufficiently learned algebra and is able to apply it to unseen problems in the real world, where he/she can be confident of the answer based on his/her exposure to the exercises provided during the lessons at school.

ML models are no different; the initial phase of building the model is through the process of training, where the model is provided with many examples. For each example, a loss function is used in place of the teacher to provide feedback, which, in turn, is used to make adjustments to the model to reduce the loss (the degree to which the model's answer was incorrect). This process of training can take many iterations and is typically compute intensive, but it offers opportunities for being parallelized (especially for neural networks); that is, a lot of the calculations can run in parallel with one another. For this reason, it's common to perform training in the cloud or some dedicated machines with enough memory and compute power. This process of training is illustrated in the following diagram:

To better illustrate the compute power required, in the blog post Cortana Intelligence and Machine Learning Blog, Microsoft data scientist Miguel Fierro and others detail the infrastructure and time required for training on the ImageNet dataset (1,000 classes with over 1.2 million photos) using an 18-layer ResNet architecture. It took approximately three days to train over 30 epochs on an Azure N-series NC-24 virtual machine with 4 GPUs, 24 CPU cores, and 224 GB of memory. The full details are described here: https://blogs.technet.microsoft.com/machinelearning/2016/11/15/imagenet-deep-neural-network-training-using-microsoft-r-server-and-azure-gpu-vms/.

After the training is complete, the model is now ready for the real world; like our student, we can now deploy and use our model to solve unseen problems. This is known as inference. Unlike training, inference only requires a single pass through the model using its gained understanding from training, that is, weights and coefficients. Additionally, there are some sections in our model that are no longer needed, so there is a degree of pruning (the reduction of less important aspects that do not affect accuracy) that can be performed to further optimize the model:

Because of these conditions, a single pass, and pruning, we can afford to perform inference on less performant machines, like our smartphone. But why would you want to do this? What are the advantages of performing inference on the edge? This is the topic of the next section.

Table of Contents for Difference between training and inference

Create new playlist

Sign In

Sign Up

Table of Contents for
Difference between training and inference