ML tools

We covered many libraries of the Python ecosystem in this book. Python has evolved to become the language of choice for data science and ML and the set of open-source libraries continues to both diversify and mature, built on the robust core of scientific computing libraries NumPy and SciPy. The popular pandas library that has contributed significantly to popularizing the use of Python for data science is planning its 1.0 release. The scikit-learn interface has become the standard for modern ML libraries like xgboost or lightgbm that often interface with the various workflow automation tools like GridSearchCV and Pipeline that we used repeatedly throughout the book.

There are several providers that aim to facilitate the ML workflow:

  • H2O.ai (https://www.h2o.ai/) offers the H2O platform that integrates cloud computing with ML automation. It allows users to fit thousands of potential models to their data to explore patterns in the data. It has interfaces in Python as well as R and Java.
  • DataRobot aims to automate the model development process by providing a platform to rapidly build and deploy predictive models in the cloud or on-premise.
  • Dataiku is a collaborative data science platform designed to help the analysts and engineers explore, prototype, build, and deliver their own data products

There are also several open-source initiatives led by companies that build on and expand the Python ecosystem:

  • The quantitative hedge fund Two Sigma contributes quantitative analysis tools to the Jupyter Notebook environment under the beakerx project
  • Bloomberg has integrated the Jupyter Notebook into its terminal to facilitate the interactive analysis of their financial data
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.131.110.169