To recap, our goal in this chapter is to create an application that will recognize what it sees. We will start by first capturing video frames, prepare these frames for our model, and finally feed them into a Core ML model to perform inference. Let's get started.