Chapter 19
IN THIS CHAPTER
Improving the user interface with sights and sounds
Manipulating data better
Working with algorithms
This chapter reviews just a few of the more interesting Python packages available today. Unlike with Haskell, finding reviews of Python packages is incredibly easy, along with articles stating people’s lists of favorite packages. However, if you want to look at a more-or-less complete listing, the best place is the Python Package Index at https://pypi.org/
. The list is so huge that you won’t find a single list but must search through categories or for particular needs. Consequently, this chapter reflects just a few interesting choices, and if you don’t see what you need, you really should search online.
Gensim (https://radimrehurek.com/gensim/
) is a Python library that can perform natural language processing (NLP) and unsupervised learning on textual data. It offers a wide range of algorithms to choose from:
https://code.google.com/archive/p/word2vec/
)One of the better platform-independent libraries to make sound work with your Python application is PyAudio (http://people.csail.mit.edu/hubert/pyaudio/
). This library lets you record and play back sounds as needed. For example, a user can record an audio note of tasks to perform later and then play back the list of items as needed).
Humans are visually oriented. If you show someone a table of information and then show the same information as a graph, the graph is always the winner when it comes to conveying information. Graphs help people see trends and understand why the data has taken the course that it has. However, getting those pixels that represent the tabular information onscreen is difficult, which is why you need a library such as PyQtGraph (http://www.pyqtgraph.org/
) to make things simpler.
Even though the library is designed around engineering, mathematical, and scientific requirements, you have no reason to avoid using it for other purposes. PyQtGraph supports both 2-D and 3-D displays, and you can use it to generate new graphics based on numeric input. The output is completely interactive, so a user can select image areas for enhancement or other sorts of manipulation. In addition, the library comes with a wealth of useful widgets (controls, such as buttons, that you can display onscreen) to make the coding process even easier.
https://wiki.python.org/moin/PyQt
) or PySide (https://wiki.python.org/moin/PySide
)http://www.numpy.org/
)http://www.scipy.org/
)http://pyopengl.sourceforge.net/
)Users respond to the Graphical User Interface (GUI) because it’s friendlier and requires less thought than using a command-line interface. Many products out there can give your Python application a GUI. However, the most commonly used product is TkInter (https://wiki.python.org/moin/TkInter
). Developers like it so much because TkInter keeps things simple. It’s actually an interface for the Tool Command Language (Tcl)/Toolkit (Tk) found at http://www.tcl.tk/
. A number of languages use Tcl/Tk as the basis for creating a GUI.
Displaying tabular data in a manner the user can understand is important. Python stores this type of data in a form that works best for programming needs. However, users need something that is organized in a manner that humans understand and that is visually appealing. The PrettyTable library (https://pypi.python.org/pypi/PrettyTable
) lets you easily add an appealing tabular presentation to your command-line application.
A database is essentially an organized manner of storing repetitive or structured data on disk. For example, customer records (individual entries in the database) are repetitive because each customer has the same sort of information requirements, such as name, address, and telephone number. The precise organization of the data determines the sort of database you’re using. Some database products specialize in text organization, others in tabular information, and still others in random bits of data (such as readings taken from a scientific instrument). Databases can use a tree-like structure or a flat-file configuration to store data. You’ll hear all sorts of odd terms when you start looking into DataBase Management System (DBMS) technology — most of which will mean something only to a DataBase Administrator (DBA) and won’t matter to you.
An RDBMS relies on a special language called the Structured Query Language (SQL) to access the individual records inside. Of course, you need some means of interacting with both the RDBMS and SQL, which is where SQLAlchemy (http://www.sqlalchemy.org/
) comes into play. This product reduces the amount of work needed to ask the database to perform tasks such as returning a specific customer record, creating a new customer record, updating an existing customer record, and deleting an old customer record.
The Toolz package (https://github.com/pytoolz/toolz
) fills in some of the functional programming paradigm gaps in Python. You specifically use it for functional support of
Interestingly enough, this same package works fine for both Python 2.x and 3.x developers, so you can get a single package to meet many of your functional data-processing needs. This package is a pure Python implementation, which means that it works everywhere.
Cloudera Oryx (http://www.cloudera.com/
) is a machine learning project for Apache Hadoop (http://hadoop.apache.org/
) that provides you with a basis for performing machine learning tasks. It emphasizes the use of live data streaming. This product helps you add security, governance, and management functionality that’s missing from Hadoop so that you can create enterprise-level applications with greater ease.
The functionality provided by Oryx builds on Apache Kafka (http://kafka.apache.org/
) and Apache Spark (http://spark.apache.org/
). Common tasks for this product are real-time spam filters and recommendation engines. You can download Oryx from https://github.com/cloudera/oryx
.
The funcy package (https://github.com/suor/funcy/
) is a mix of features inspired by clojure (https://clojure.org/
). It allows you to make your Python environment better oriented toward the functional programming paradigm, while also adding support for data processing and additional algorithms. That sounds like a lot of ground to cover, and it is, but you can break the functionality of this particular package into these areas:
The SciPy (http://www.scipy.org/
) stack contains a host of other libraries that you can also download separately. These libraries provide support for mathematics, science, and engineering. When you obtain SciPy, you get a set of libraries designed to work together to create applications of various sorts. These libraries are:
The SciPy library itself focuses on numerical routines, such as routines for numerical integration and optimization. SciPy is a general-purpose library that provides functionality for multiple problem domains. It also provides support for domain-specific libraries, such as Scikit-learn, Scikit-image, and statsmodels. To make your SciPy experience even better, try the resources at http://www.scipy-lectures.org/
. The site contains many lectures and tutorials on SciPy’s functions.
The XGBoost package (https://github.com/dmlc/xgboost
) enables you to apply a Gradient Boosting Machine (GBM) (https://towardsdatascience.com/boosting-algorithm-gbm-97737c63daa3?gi=df155908abce
) to any problem, thanks to its wide choice of objective functions and evaluation metrics. It operates with a variety of languages, including
3.135.248.37