We have now come to the concluding chapter of this book. Throughout the course of this book, you have mastered the techniques to create and customize static and animated plots using real-world data in different formats scraped from the web. To wrap up, we will start a mini-project in this chapter to combine the skills of data analytics with the visualization techniques you've learned. We will demonstrate how to integrate visualization techniques in your current workflow.
In the era of big data, machine learning becomes fundamental to ease analytic work by replacing huge amounts of manual curation with automatic prediction. Yet, before we enter model building, Exploratory Data Analysis (EDA) is always essential to get a good grasp of what the data is like. Constant review during the optimization process also helps improve our training strategy and results.
High-dimensional data typically requires special processing techniques to be visualized intuitively. Statistical methods such as Principle Component Analysis (PCA) and t-Distributed Stochastic Neighbor Embedding (t-SNE) are important skills in reducing the dimension of data for effective visualization.
As a showcase, we will demonstrate the use of various visualization techniques in a workflow involving recognizing handwritten digits using a Convolutional Neural Network (CNN).
One important note is that we do not intend to illustrate all the mathematics and machine learning approaches in detail in this chapter. Our goal is to visualize some of the processes in between. Hopefully, readers will appreciate the importance of exploring processes such as the loss function when training a CNN, or visualizing the dimension reduction results with different parameters.