Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 19

Ten (Plus) Must-Have Python Packages

IN THIS CHAPTER

Improving the user interface with sights and sounds

Manipulating data better

Working with algorithms

This chapter reviews just a few of the more interesting Python packages available today. Unlike with Haskell, finding reviews of Python packages is incredibly easy, along with articles stating people’s lists of favorite packages. However, if you want to look at a more-or-less complete listing, the best place is the Python Package Index at https://pypi.org/. The list is so huge that you won’t find a single list but must search through categories or for particular needs. Consequently, this chapter reflects just a few interesting choices, and if you don’t see what you need, you really should search online.

MODULES, PACKAGES, AND LIBRARIES

There is general confusion over some terms (module, package, and library) used in Python and, unfortunately, this book won’t help you untie this Gordian knot. When possible, this chapter uses the vendor term for whatever product you’re reading about. However, the terms do have different meanings, which you can read about at https://knowpapa.com/modpaclib-py/. Consequently, sites such as PyPI use package (https://pypi.org/) because they offer collections of modules (which are individual .py files), while some vendors use the term library, presumably because the product uses compiled code created in another language, such as C.

Of course, you might ask why Python's core code is called the core library. That’s because the core library is written in C and compiled, but then you have access to all the packages (collections of modules) that add to that core library. If you find that one or more of the descriptions in this chapter contain the wrong term, it’s really not a matter of wanting to use the wrong term; it’s more a of matter of dealing with the confusion caused by multiple terms that aren’t necessarily well defined or appropriately used.

Gensim

Gensim (https://radimrehurek.com/gensim/) is a Python library that can perform natural language processing (NLP) and unsupervised learning on textual data. It offers a wide range of algorithms to choose from:

TF-IDF
Random projections
Latent Dirichlet allocation
Latent semantic analysis
Semantic algorithms:
- word2vec
- document2vec (https://code.google.com/archive/p/word2vec/)

Word2vec is based on neural networks (shallow, not deep learning, networks) and it allows meaningful transformations of words into vectors of coordinates that you can operate in a semantic way. For instance, operating on the vector representing Paris, subtracting the vector France, and then adding the vector Italy results in the vector Rome, demonstrating how you can use mathematics and the right Word2vec model to operate semantic operations on text. Fortunately, if this seems like Greek to you, Gensim offers excellent tutorials to make using this product easier.

PyAudio

One of the better platform-independent libraries to make sound work with your Python application is PyAudio (http://people.csail.mit.edu/hubert/pyaudio/). This library lets you record and play back sounds as needed. For example, a user can record an audio note of tasks to perform later and then play back the list of items as needed).

Working with sound on a computer always involves trade-offs. For example, a platform-independent library can't take advantage of special features that a particular platform might possess. In addition, it might not support all the file formats that a particular platform uses. The reason to use a platform-independent library is to ensure that your application provides basic sound support on all systems that it might interact with.

CLASSIFYING PYTHON SOUND TECHNOLOGIES

Realize that sound comes in many forms in computers. The basic multimedia services provided by Python (see the documentation at https://docs.python.org/3/library/mm.html) provide essential playback functionality. You can also write certain types of audio files, but the selection of file formats is limited. In addition, some packages, such as winsound (https://docs.python.org/3/library/winsound.html), are platform dependent, so you can’t use them in an application designed to work everywhere. The standard Python offerings are designed to provide basic multimedia support for playing back system sounds.

The middle ground, augmented audio functionality designed to improve application usability, is covered by libraries such as PyAudio. You can see a list of these libraries at https://wiki.python.org/moin/Audio. However, these libraries usually focus on business needs, such as recording notes and playing them back later. Hi-fidelity output isn’t part of the plan for these libraries.

Gamers need special audio support to ensure that they can hear special effects, such as a monster walking behind them. These needs are addressed by libraries such as PyGame (http://www.pygame.org/news.html). When using these libraries, you need higher-end equipment and have to plan to spend considerable time working on just the audio features of your application. You can see a list of these libraries at https://wiki.python.org/moin/PythonGameLibraries.

PyQtGraph

Humans are visually oriented. If you show someone a table of information and then show the same information as a graph, the graph is always the winner when it comes to conveying information. Graphs help people see trends and understand why the data has taken the course that it has. However, getting those pixels that represent the tabular information onscreen is difficult, which is why you need a library such as PyQtGraph (http://www.pyqtgraph.org/) to make things simpler.

Even though the library is designed around engineering, mathematical, and scientific requirements, you have no reason to avoid using it for other purposes. PyQtGraph supports both 2-D and 3-D displays, and you can use it to generate new graphics based on numeric input. The output is completely interactive, so a user can select image areas for enhancement or other sorts of manipulation. In addition, the library comes with a wealth of useful widgets (controls, such as buttons, that you can display onscreen) to make the coding process even easier.

Unlike many of the offerings in this chapter, PyQtGraph isn’t a free-standing library, which means that you must have other products installed to use it. This isn’t unexpected because PyQtGraph is doing quite a lot of work. You need these items installed on your system to use it:

Python version 2.7 or higher
PyQt version 4.8 or higher (https://wiki.python.org/moin/PyQt) or PySide (https://wiki.python.org/moin/PySide)
numpy (http://www.numpy.org/)
scipy (http://www.scipy.org/)
PyOpenGL (http://pyopengl.sourceforge.net/)

TkInter

Users respond to the Graphical User Interface (GUI) because it’s friendlier and requires less thought than using a command-line interface. Many products out there can give your Python application a GUI. However, the most commonly used product is TkInter (https://wiki.python.org/moin/TkInter). Developers like it so much because TkInter keeps things simple. It’s actually an interface for the Tool Command Language (Tcl)/Toolkit (Tk) found at http://www.tcl.tk/. A number of languages use Tcl/Tk as the basis for creating a GUI.

You might not relish the idea of adding a GUI to your application. Doing so tends to be time consuming and doesn’t make the application any more functional (it also slows down the application, in many cases). The point is that users like GUIs, and if you want your application to see strong use, you need to meet user requirements.

PrettyTable

Displaying tabular data in a manner the user can understand is important. Python stores this type of data in a form that works best for programming needs. However, users need something that is organized in a manner that humans understand and that is visually appealing. The PrettyTable library (https://pypi.python.org/pypi/PrettyTable) lets you easily add an appealing tabular presentation to your command-line application.

SQLAlchemy

A database is essentially an organized manner of storing repetitive or structured data on disk. For example, customer records (individual entries in the database) are repetitive because each customer has the same sort of information requirements, such as name, address, and telephone number. The precise organization of the data determines the sort of database you’re using. Some database products specialize in text organization, others in tabular information, and still others in random bits of data (such as readings taken from a scientific instrument). Databases can use a tree-like structure or a flat-file configuration to store data. You’ll hear all sorts of odd terms when you start looking into DataBase Management System (DBMS) technology — most of which will mean something only to a DataBase Administrator (DBA) and won’t matter to you.

The most common type of database is called a Relational DataBase Management System (RDBMS), which uses tables that are organized into records and fields (just like a table you might draw on a sheet of paper). Each field is part of a column of the same kind of information, such as the customer’s name. Tables are related to each other in various ways, so creating complex relationships is possible. For example, each customer may have one or more entries in a purchase-order table, and the customer table and the purchase-order table are therefore related to each other.

An RDBMS relies on a special language called the Structured Query Language (SQL) to access the individual records inside. Of course, you need some means of interacting with both the RDBMS and SQL, which is where SQLAlchemy (http://www.sqlalchemy.org/) comes into play. This product reduces the amount of work needed to ask the database to perform tasks such as returning a specific customer record, creating a new customer record, updating an existing customer record, and deleting an old customer record.

Toolz

The Toolz package (https://github.com/pytoolz/toolz) fills in some of the functional programming paradigm gaps in Python. You specifically use it for functional support of

Iterators
Functions
Dictionaries

Interestingly enough, this same package works fine for both Python 2.x and 3.x developers, so you can get a single package to meet many of your functional data-processing needs. This package is a pure Python implementation, which means that it works everywhere.

If you need additional speed, don’t really care about interoperability with every third-party package out there, and don't need the ability to work on every platform, you can use a Cython (http://cython.org/) implementation of the same package called CyToolz (https://github.com/pytoolz/cytoolz/). Besides being two to five times faster, CyToolz offers access to a C API, so there are some advantages to using it.

Cloudera Oryx

Cloudera Oryx (http://www.cloudera.com/) is a machine learning project for Apache Hadoop (http://hadoop.apache.org/) that provides you with a basis for performing machine learning tasks. It emphasizes the use of live data streaming. This product helps you add security, governance, and management functionality that’s missing from Hadoop so that you can create enterprise-level applications with greater ease.

The functionality provided by Oryx builds on Apache Kafka (http://kafka.apache.org/) and Apache Spark (http://spark.apache.org/). Common tasks for this product are real-time spam filters and recommendation engines. You can download Oryx from https://github.com/cloudera/oryx.

funcy

The funcy package (https://github.com/suor/funcy/) is a mix of features inspired by clojure (https://clojure.org/). It allows you to make your Python environment better oriented toward the functional programming paradigm, while also adding support for data processing and additional algorithms. That sounds like a lot of ground to cover, and it is, but you can break the functionality of this particular package into these areas:

Manipulation of collections
Manipulation of sequences
Additional support for functional programming constructs
Creation of decorators
Abstraction of flow control
Additional debugging support

Some people might skip the bottom part of the GitHub download pages (and for good reason; they normally don’t contain a lot of information). However, pages the author of the funcy provides access to essays about why funcy implements certain features in a particular manner and those essay links appear at the bottom of the GitHub page. For example, you can read "Abstracting Control Flow" (http://hackflow.com/blog/2013/10/08/abstracting-control-flow/), which helps you understand the need for this feature, especially in a functional environment. In fact, you might find that other GitHub pages (not many, but a few) also contain these sorts of helpful links.

SciPy

The SciPy (http://www.scipy.org/) stack contains a host of other libraries that you can also download separately. These libraries provide support for mathematics, science, and engineering. When you obtain SciPy, you get a set of libraries designed to work together to create applications of various sorts. These libraries are:

NumPy
SciPy
matplotlib
IPython
Sympy
Pandas

The SciPy library itself focuses on numerical routines, such as routines for numerical integration and optimization. SciPy is a general-purpose library that provides functionality for multiple problem domains. It also provides support for domain-specific libraries, such as Scikit-learn, Scikit-image, and statsmodels. To make your SciPy experience even better, try the resources at http://www.scipy-lectures.org/. The site contains many lectures and tutorials on SciPy’s functions.

XGBoost

The XGBoost package (https://github.com/dmlc/xgboost) enables you to apply a Gradient Boosting Machine (GBM) (https://towardsdatascience.com/boosting-algorithm-gbm-97737c63daa3?gi=df155908abce) to any problem, thanks to its wide choice of objective functions and evaluation metrics. It operates with a variety of languages, including

Python
R
Java
C++

In spite of the fact that GBM is a sequential algorithm (and thus slower than others that can take advantage of modern multicore computers), XGBoost leverages multithread processing in order to search in parallel for the best splits among the features. The use of multithreading helps XGBoost turn in an unbeatable performance when compared to other GBM implementations, both in R and Python. Because of all that it contains, the full package name is eXtreme Gradient Boosting (or XGBoost for short). You can find complete documentation for this package at https://xgboost.readthedocs.org/en/latest/.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 19: Ten (Plus) Must-Have Python Packages

Create new playlist

Sign In

Sign Up