Summary

In this chapter, we covered the foundation of Python's data science stack—the NumPy, pandas, SciPy, scikit-learn, and Jupyter libraries. By doing so, we were able to gather an understanding of this ecosystem, why and when we need all of these packages, and how they relate to each other. Understanding their relationships helps to navigate and search for a specific functionality or tool to use.

We also touched upon the reasons why NumPy-based computations are so fast, and why this leads to a somewhat different philosophy of data-driven development. We further showcased how pandas complements NumPy arrays by supporting plenty of data formats and types, and SciPy and scikit-learn build upon those data structures, allowing us to quickly train and use machine learning models. Finally, we discussed why Jupyter plays such an important role in this process and what are the current developments and new use cases for Jupyter Notebooks.

In the following chapters, starting right with the next one, we'll use all of the packages and tools we mentioned and more, to process data and build data-driven projects. In the next chapter specifically, we'll explore and process the data on WWII battles we collected in Chapter 7, Scraping Data from the Web with Beautiful Soup 4, so that it will be ready for data analysis and visualization.

Table of Contents for Summary

Create new playlist

Sign In

Sign Up

Table of Contents for
Summary