Chapter 19

Ten (Plus) Must-Have Python Packages

IN THIS CHAPTER

Check Improving the user interface with sights and sounds

Check Manipulating data better

Check Working with algorithms

This chapter reviews just a few of the more interesting Python packages available today. Unlike with Haskell, finding reviews of Python packages is incredibly easy, along with articles stating people’s lists of favorite packages. However, if you want to look at a more-or-less complete listing, the best place is the Python Package Index at https://pypi.org/. The list is so huge that you won’t find a single list but must search through categories or for particular needs. Consequently, this chapter reflects just a few interesting choices, and if you don’t see what you need, you really should search online.

Gensim

Gensim (https://radimrehurek.com/gensim/) is a Python library that can perform natural language processing (NLP) and unsupervised learning on textual data. It offers a wide range of algorithms to choose from:

Technicalstuff Word2vec is based on neural networks (shallow, not deep learning, networks) and it allows meaningful transformations of words into vectors of coordinates that you can operate in a semantic way. For instance, operating on the vector representing Paris, subtracting the vector France, and then adding the vector Italy results in the vector Rome, demonstrating how you can use mathematics and the right Word2vec model to operate semantic operations on text. Fortunately, if this seems like Greek to you, Gensim offers excellent tutorials to make using this product easier.

PyAudio

One of the better platform-independent libraries to make sound work with your Python application is PyAudio (http://people.csail.mit.edu/hubert/pyaudio/). This library lets you record and play back sounds as needed. For example, a user can record an audio note of tasks to perform later and then play back the list of items as needed).

Tip Working with sound on a computer always involves trade-offs. For example, a platform-independent library can't take advantage of special features that a particular platform might possess. In addition, it might not support all the file formats that a particular platform uses. The reason to use a platform-independent library is to ensure that your application provides basic sound support on all systems that it might interact with.

PyQtGraph

Humans are visually oriented. If you show someone a table of information and then show the same information as a graph, the graph is always the winner when it comes to conveying information. Graphs help people see trends and understand why the data has taken the course that it has. However, getting those pixels that represent the tabular information onscreen is difficult, which is why you need a library such as PyQtGraph (http://www.pyqtgraph.org/) to make things simpler.

Even though the library is designed around engineering, mathematical, and scientific requirements, you have no reason to avoid using it for other purposes. PyQtGraph supports both 2-D and 3-D displays, and you can use it to generate new graphics based on numeric input. The output is completely interactive, so a user can select image areas for enhancement or other sorts of manipulation. In addition, the library comes with a wealth of useful widgets (controls, such as buttons, that you can display onscreen) to make the coding process even easier.

Remember Unlike many of the offerings in this chapter, PyQtGraph isn’t a free-standing library, which means that you must have other products installed to use it. This isn’t unexpected because PyQtGraph is doing quite a lot of work. You need these items installed on your system to use it:

TkInter

Users respond to the Graphical User Interface (GUI) because it’s friendlier and requires less thought than using a command-line interface. Many products out there can give your Python application a GUI. However, the most commonly used product is TkInter (https://wiki.python.org/moin/TkInter). Developers like it so much because TkInter keeps things simple. It’s actually an interface for the Tool Command Language (Tcl)/Toolkit (Tk) found at http://www.tcl.tk/. A number of languages use Tcl/Tk as the basis for creating a GUI.

Tip You might not relish the idea of adding a GUI to your application. Doing so tends to be time consuming and doesn’t make the application any more functional (it also slows down the application, in many cases). The point is that users like GUIs, and if you want your application to see strong use, you need to meet user requirements.

PrettyTable

Displaying tabular data in a manner the user can understand is important. Python stores this type of data in a form that works best for programming needs. However, users need something that is organized in a manner that humans understand and that is visually appealing. The PrettyTable library (https://pypi.python.org/pypi/PrettyTable) lets you easily add an appealing tabular presentation to your command-line application.

SQLAlchemy

A database is essentially an organized manner of storing repetitive or structured data on disk. For example, customer records (individual entries in the database) are repetitive because each customer has the same sort of information requirements, such as name, address, and telephone number. The precise organization of the data determines the sort of database you’re using. Some database products specialize in text organization, others in tabular information, and still others in random bits of data (such as readings taken from a scientific instrument). Databases can use a tree-like structure or a flat-file configuration to store data. You’ll hear all sorts of odd terms when you start looking into DataBase Management System (DBMS) technology — most of which will mean something only to a DataBase Administrator (DBA) and won’t matter to you.

Remember The most common type of database is called a Relational DataBase Management System (RDBMS), which uses tables that are organized into records and fields (just like a table you might draw on a sheet of paper). Each field is part of a column of the same kind of information, such as the customer’s name. Tables are related to each other in various ways, so creating complex relationships is possible. For example, each customer may have one or more entries in a purchase-order table, and the customer table and the purchase-order table are therefore related to each other.

An RDBMS relies on a special language called the Structured Query Language (SQL) to access the individual records inside. Of course, you need some means of interacting with both the RDBMS and SQL, which is where SQLAlchemy (http://www.sqlalchemy.org/) comes into play. This product reduces the amount of work needed to ask the database to perform tasks such as returning a specific customer record, creating a new customer record, updating an existing customer record, and deleting an old customer record.

Toolz

The Toolz package (https://github.com/pytoolz/toolz) fills in some of the functional programming paradigm gaps in Python. You specifically use it for functional support of

  • Iterators
  • Functions
  • Dictionaries

Interestingly enough, this same package works fine for both Python 2.x and 3.x developers, so you can get a single package to meet many of your functional data-processing needs. This package is a pure Python implementation, which means that it works everywhere.

Tip If you need additional speed, don’t really care about interoperability with every third-party package out there, and don't need the ability to work on every platform, you can use a Cython (http://cython.org/) implementation of the same package called CyToolz (https://github.com/pytoolz/cytoolz/). Besides being two to five times faster, CyToolz offers access to a C API, so there are some advantages to using it.

Cloudera Oryx

Cloudera Oryx (http://www.cloudera.com/) is a machine learning project for Apache Hadoop (http://hadoop.apache.org/) that provides you with a basis for performing machine learning tasks. It emphasizes the use of live data streaming. This product helps you add security, governance, and management functionality that’s missing from Hadoop so that you can create enterprise-level applications with greater ease.

The functionality provided by Oryx builds on Apache Kafka (http://kafka.apache.org/) and Apache Spark (http://spark.apache.org/). Common tasks for this product are real-time spam filters and recommendation engines. You can download Oryx from https://github.com/cloudera/oryx.

funcy

The funcy package (https://github.com/suor/funcy/) is a mix of features inspired by clojure (https://clojure.org/). It allows you to make your Python environment better oriented toward the functional programming paradigm, while also adding support for data processing and additional algorithms. That sounds like a lot of ground to cover, and it is, but you can break the functionality of this particular package into these areas:

  • Manipulation of collections
  • Manipulation of sequences
  • Additional support for functional programming constructs
  • Creation of decorators
  • Abstraction of flow control
  • Additional debugging support

Tip Some people might skip the bottom part of the GitHub download pages (and for good reason; they normally don’t contain a lot of information). However, pages the author of the funcy provides access to essays about why funcy implements certain features in a particular manner and those essay links appear at the bottom of the GitHub page. For example, you can read "Abstracting Control Flow" (http://hackflow.com/blog/2013/10/08/abstracting-control-flow/), which helps you understand the need for this feature, especially in a functional environment. In fact, you might find that other GitHub pages (not many, but a few) also contain these sorts of helpful links.

SciPy

The SciPy (http://www.scipy.org/) stack contains a host of other libraries that you can also download separately. These libraries provide support for mathematics, science, and engineering. When you obtain SciPy, you get a set of libraries designed to work together to create applications of various sorts. These libraries are:

  • NumPy
  • SciPy
  • matplotlib
  • IPython
  • Sympy
  • Pandas

The SciPy library itself focuses on numerical routines, such as routines for numerical integration and optimization. SciPy is a general-purpose library that provides functionality for multiple problem domains. It also provides support for domain-specific libraries, such as Scikit-learn, Scikit-image, and statsmodels. To make your SciPy experience even better, try the resources at http://www.scipy-lectures.org/. The site contains many lectures and tutorials on SciPy’s functions.

XGBoost

The XGBoost package (https://github.com/dmlc/xgboost) enables you to apply a Gradient Boosting Machine (GBM) (https://towardsdatascience.com/boosting-algorithm-gbm-97737c63daa3?gi=df155908abce) to any problem, thanks to its wide choice of objective functions and evaluation metrics. It operates with a variety of languages, including

  • Python
  • R
  • Java
  • C++

Tip In spite of the fact that GBM is a sequential algorithm (and thus slower than others that can take advantage of modern multicore computers), XGBoost leverages multithread processing in order to search in parallel for the best splits among the features. The use of multithreading helps XGBoost turn in an unbeatable performance when compared to other GBM implementations, both in R and Python. Because of all that it contains, the full package name is eXtreme Gradient Boosting (or XGBoost for short). You can find complete documentation for this package at https://xgboost.readthedocs.org/en/latest/.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.248.37