Trying SciPy and scikit-learn

The SciPy package essentially kicked off the entire era of scientific Python. Created in 2001 by researchers Travis Oliphant, Pearu Peterson, and Eric Jones, it was formed as a collection of basic and universal scientific techniques. Over time, the package grew and now offers generic tooling and popular techniques for scientific analysis. Its submodules cover linear algebra, integration, optimization, interpolation, statistics, and many more.

With the rise of machine learning, the corresponding submodule of SciPy grew more and more complex. At some point, it became so big, the decision was made to reintroduce it as a separate, independent package—scikit-learnAs the mark of its origins, the package kept its name, defined earlier as SciPy kit—learn. Due to its simple and unified interface and a large variety of models, scikit-learn quickly became the main go-to tool for machine learning in Python, and its interface for the models is essentially an industry standard. Indeed, many other packages, such as xgboost and fbprophet, replicate scikit-learn model interfaces for their models, allowing us to quickly swap and stack different machine learning algorithms.

As a foundational package for machine learning, scikit-learn offers this tooling:

  • Data preparation—scalers and transformers
  • Model selectioncross-validations, hyperparameter optimization, pipelines, and so on
  • Multiple metrics and score/loss functions
  • Dimensionality reduction
  • Clusterization
  • Regression and classification with multiple models

scikit-learn assumes data to be in 2-dimensional structures similar to NumPy arrays, so both NumPy arrays themselves and pandas dataframes will work. We are going to use scikit-learn to build a predictive model in Chapter 13Training a Machine Learning Model, and Chapter 14, Improving Your Model – Pipelines and Experiments.

There are hundreds of scientific Python packages for any given domaineconomic, social sciences, game theory, physics, metallurgy, genomics, psychology, neuroscience, and historythe list can go on and on. The vast majority of those packages, though, share their origins, in that, they all use NumPy arrays as data structures and functions from SciPy and scikit-learn at the core of their operations. But the list of packages essential to the popularity of Python's data science is not complete without mentioning a crucial environment for all of this codeJupyter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.21.160