Trying SciPy and scikit-learn

The SciPy package essentially kicked off the entire era of scientific Python. Created in 2001 by researchers Travis Oliphant, Pearu Peterson, and Eric Jones, it was formed as a collection of basic and universal scientific techniques. Over time, the package grew and now offers generic tooling and popular techniques for scientific analysis. Its submodules cover linear algebra, integration, optimization, interpolation, statistics, and many more.

With the rise of machine learning, the corresponding submodule of SciPy grew more and more complex. At some point, it became so big, the decision was made to reintroduce it as a separate, independent package—scikit-learn. As the mark of its origins, the package kept its name, defined earlier as SciPy kit—learn. Due to its simple and unified interface and a large variety of models, scikit-learn quickly became the main go-to tool for machine learning in Python, and its interface for the models is essentially an industry standard. Indeed, many other packages, such as xgboost and fbprophet, replicate scikit-learn model interfaces for their models, allowing us to quickly swap and stack different machine learning algorithms.

As a foundational package for machine learning, scikit-learn offers this tooling:

Data preparation—scalers and transformers
Model selection—cross-validations, hyperparameter optimization, pipelines, and so on
Multiple metrics and score/loss functions
Dimensionality reduction
Clusterization
Regression and classification with multiple models

scikit-learn assumes data to be in 2-dimensional structures similar to NumPy arrays, so both NumPy arrays themselves and pandas dataframes will work. We are going to use scikit-learn to build a predictive model in Chapter 13, Training a Machine Learning Model, and Chapter 14, Improving Your Model – Pipelines and Experiments.

There are hundreds of scientific Python packages for any given domain—economic, social sciences, game theory, physics, metallurgy, genomics, psychology, neuroscience, and history—the list can go on and on. The vast majority of those packages, though, share their origins, in that, they all use NumPy arrays as data structures and functions from SciPy and scikit-learn at the core of their operations. But the list of packages essential to the popularity of Python's data science is not complete without mentioning a crucial environment for all of this code—Jupyter.

Table of Contents for Trying SciPy and scikit-learn

Create new playlist

Sign In

Sign Up

Table of Contents for
Trying SciPy and scikit-learn