Chapter 9. Other Features of the Language

In this chapter you are introduced to some other aspects of Python that are less frequently used, as well as modules that are very commonly used. Each section describes at least one way that the feature is typically used and then offers example code.

In previous chapters you looked at some common functions, and even learned to create your own. Part of the charm of Python is its breadth of built-in functions and modules that cater to both broad and obscure problems. Before learning to build your own module, you look at some of the ones Python offers to get a better understanding of their usage.

In this chapter you learn:

  • To work with the lambda and filter functions.

  • To use map to avoid loops.

  • To string substitutions.

  • The getopt module.

Lambda and Filter: Short Anonymous Functions

Sometimes you need a very simple function invocation — something that is not generally useful or that is so specific that its use needs to be completely different if it is invoked in another location in your code. For these occasions, there is a special operation: lambda. Lambda is not a function itself but a special word that tells Python to create a function and use it in place, rather than reference it from a name.

To demonstrate lambda being used, the following example uses filter, which is a function that enables you to take a list and remove elements based on criteria you define within a function you write. Normal functions can be used, but in simple cases, such as where you want only odd numbers (or odd-numbered elements, or strings beginning with something, and so on), a fully defined function could be overkill.

# use lambda with filter
filter_me = [1, 2, 3, 4, 6,7 ,8, 11, 12, 14, 15, 19, 22]
# This will only return true for even numbers (because x%2 is 0, or False,
# for odd numbers)
result = filter(lambda x: x%2 == 0, filter_me)
print(*result)

The functions that lambda creates are called anonymous functions because of their lack of a name. However, you can use the result of the lambda statement to bind the name to a function yourself. That name will be available only in the scope in which the name was created, like any other name:

# use lambda with filter, but bind it to a name
filter_me = [1, 2, 3, 4, 6,7 ,8, 11, 12, 14, 15, 19, 22]
# This will only return true for even numbers (because x%2 is 0, or False,
# for odd numbers)
func = lambda x: x%2 == 0
result = filter(func, filter_me)
print(*result)

Lambda can only be a simple function, and it can't contain statements, such as creating a name for a variable. Inside a lambda, you can only perform a limited set of operations, such as testing for equality, multiplying numbers, or using other already existing functions in a specific manner. You can't do things like use if ... : elsif ... : else: constructs or even create new names for variables! You can only use the parameters passed into the lambda function. You can, however, do slightly more than perform simple declarative statements by using the and and or operations. However, you should still keep in mind that lambda is for very limited uses.

The main use for lambda is with the built-in functions map and filter. Used with lambda, these functions provide compact ways to perform some great operations while avoiding the need for loops. You've already seen filter in action, which could be a difficult loop to write.

Map: Short-Circuiting Loops

One common place to use anonymous functions is when the map function is called. Map is a special function for cases when you need to do a specific action on every element of a list. It enables you to accomplish this without having to write the loop.

Decisions within Lists — List Comprehension

The oddly named list comprehension feature entered the language in Python 2.0. It enables you to write miniature loops and decisions within the list dereferencing operators (the square brackets) to define parameters that will be used to restrict the range of elements being accessed.

For instance, to create a list that prints just the positive numbers in a list, you can use list comprehension:

# First, just print even numbers
everything = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 ]
print([ x for x in everything if x%2 == 0])

This can be a nice and compact way of providing a portion of a list to a loop — but with only the pertinent parts of the list, based on what you want in your program at the moment, being presented to your loop.

List comprehension provides you with the same functionality as filter or map combined with lambda, but it is a form that gives you more decision-making power because it can include loops and conditionals, whereas lambda only enables you to perform one simple expression.

In most cases, list comprehension will also run faster than the alternative.

Generating Iterators for Loops

Python has a special feature that enables you to create iterators — the range function:

f = range (10, 20)
print(*f)

This code produces an obvious-looking result:

>>> print(*f)
10 11 12 13 14 15 16 17 18 19

By itself, this doesn't seem profound, but it is essential for situations when you need to use a for loop that will continue for a specific number of iterations, and that isn't based on an existing list; and this number may not be determined at the time when the program was written, but it becomes known only when the program is already running.

If range is given only a single number, it will count from zero to that number. The number can be positive or negative:

for number in range(10):
    print("Number is now %d" % number)

This produces the obvious output, which is what you want:

Number is now 0
Number is now 1
Number is now 2
Number is now 3
Number is now 4
Number is now 5
Number is now 6
Number is now 7
Number is now 8
Number is now 9

In addition, if you only want, for example, every other number or every third number, you can use an even more optional third parameter, called the step, that describes what the interval will be between each number that range creates:

for number in range(5, 55, 4):
    print("Number from 5 to 55, by fours: %d" % number)

This results in the selective list of numbers that you specified:

Number from 5 to 55, by fours: 5
Number from 5 to 55, by fours: 9
Number from 5 to 55, by fours: 13
Number from 5 to 55, by fours: 17
Number from 5 to 55, by fours: 21
Number from 5 to 55, by fours: 25
Number from 5 to 55, by fours: 29
Number from 5 to 55, by fours: 33
Number from 5 to 55, by fours: 37
Number from 5 to 55, by fours: 41
Number from 5 to 55, by fours: 45
Number from 5 to 55, by fours: 49
Number from 5 to 55, by fours: 53

In previous versions of Python, a program could be handling huge numbers of elements — perhaps hundreds of thousands, or even millions, in which case range would create an array with every element that you've asked for — example, from zero to the number of all the possible systems on the Internet. When this many things need to be examined, each element uses a bit of computer memory, which can eventually take up all of the memory on a system. To avoid any problems with this sort of a really large list, Python had a special built-in class called xrange that created fewer elements in memory. In Python 3.1, range was changed so that it no longer created a list, but instead an iterator, essentially making it perform in the exact manner that xrange behaved. Xrange has since been removed from the language.

Special String Substitution Using Dictionaries

One syntax you haven't been shown yet is a special syntax for using dictionaries to populate string substitutions. This can come up when you want a configurable way to print out strings — such as a formatted report or something similar.

Featured Modules

Starting in Chapter 7, you saw modules used to add functionality to Python. In Chapter 8, you learned how interaction with the operating system and its files is achieved through modules that provide interfaces to how the system works with the os module.

In this section, you see examples of some other common modules that will help you to start building your own programs.

Getopt — Getting Options from the Command Line

On UNIX systems, the most common way to specify the behavior of a program when it runs is to add parameters to the command line of a program. Even when a program is not run from the command line but is instead run using fork and exec (more on this later in this chapter), a command line is constructed when it is invoked. This makes it a universal way of controlling the behavior of your programs.

You may have seen, for instance, that many programs can be run so that they provide you with some basic information about how they should be run. Python enables you to do this with -h:

$ python -h
usage: python30 [option] ... [-c cmd | -m mod | file | -] [arg] ...
Options and arguments (and corresponding environment variables):
-c cmd : program passed in as string (terminates option list)
-d     : debug output from parser (also PYTHONDEBUG=x)
-E     : ignore environment variables (such as PYTHONPATH)
[ etc. ]

In the past, different conventions were available on different UNIX platforms to specify these options, but this has largely resulted in two forms of options being used by most projects: the short form, such as the help-message producing option to Python, and a long form, such as - -help for help.

To accept these sorts of options makes sense. Ideally, you'd like to offer a short and a long form of commands that are common, and allow each one to optionally take a specification. So if you wanted to write a program that had a configuration file that the user could specify, you may want one option like -c short for experienced users, but provide a longer option too, like - -config-file. In either case, you'd want them to be the same function in your program to save you time, but you'd like to give users the freedom to use these options however they want to use them.

The getopt module provides two functions to make this standard convention easy to use: getopt.getopt and getopt.gnu_getopt. They are both basically the same. The basic getopt only works until the first non-option is encountered — nothing else is checked.

For getopt to be useful, you have to know what options you want to be useful. Normally, it's considered the least you can do for your users to write programs that provide them with information about how to run the program, such as how Python prints information with the -h option.

In addition, it's often very useful to have a configuration file. Using these ideas as a starting point, you could start your new programs so that -h and - -help both produce a minimal message about how your program is used, and using -c or - -config-file=file would enable you to specify a configuration file that is different from the default configuration:

import sys
import getopt
# Remember, the first thing in the sys.argv list is the name of the command
# You don't need that.
cmdline_params = sys.argv[1:]

opts, args = getopt.getopt(cmdline_params, 'hc:', ['help', 'config='])

for option, parameter in opts:

    if option == '-h' or option == '- -help':
        print("This program can be run with either -h or - -help for this message,")
        print("or with -c or - -config=<file> to specify a different
configuration file")
if option in ('-c', '- -config'): # this means the same as the above
        print("Using configuration file %s" % parameter)

When long options are used and require a parameter (like - -config in the preceding example), the equal sign must connect the option and the value of the parameter. However, when short options are used, one or more space or tab characters can separate the option from its corresponding value. This distinction is to duplicate the behavior of the options on older UNIX machines that persist to the modern day. They persist because so many people expect that behavior. What can you do?

The preceding code snippet, if run in a program with the parameters -c test -h - -config=secondtest, produces the following output:

[('-c', 'test'), ('-h', ''), ('- -config', 'secondtest')] []
Using configuration file test
This program can be run with either -h or - -help for this message,
or with -c or - -config=<file> to specify a different configuration file

Using configuration file secondtest

Note how the second instance of the configuration file is accepted silently; and when it is reached, the same code that sets the config file is revisited so that the second instance is used.

The second list, the args data, is an empty list because all of the options provided to the program on the command line were valid options, or valid parameters to options. If you inserted other strings in the middle of your options, the normal getopt would behave differently. If the parameters used were instead -c test useless_information_here -h - -config=secondtest, the output would say a lot less, and the args array would have a lot more in it.

[('-c', 'test')] ['useless_information_here', '-h', '- -config=secondtest']
Using configuration file test

The gnu_getopt lets you mix and match on the command line so that non-options can appear anywhere in the midst of the options, with more options parsed afterward instead of stopping there:

opts, args = getopt.gnu_getopt(cmdline_params, 'hc:', ['help', 'config='])

for option, parameter in opts:

    if option == '-h' or option == '- -help':
        print("This program can be run with either -h or - -help for this message,")
        print("or with -c or - -config=<file> to specify a different
configuration file")

    if option in ('-c', '- -config'): # this means the same as the above
        print("Using configuration file %s" % parameter)

The important point to note is that if you use something that doesn't meet the criteria for an option (by beginning with a or a +, or following an option that takes a parameter), the two behave differently. Using the options -c test useless_information_here -h - -config=secondtest, the gnu_getopt function provides the following output, with the odd duck being the only part of the command line left in the args array:

[('-c', 'test'), ('-h', ''), ('- -config', 'secondtest')]
['useless_information_here']
Using configuration file test
This program can be run with either -h or - -help for this message,
or with -c or - -config=<file> to specify a different configuration file

Using configuration file secondtest

Using More Than One Process

In UNIX and UNIX-like operating systems, the main way of performing certain kinds of subtasks is to create a new process running a new program. On UNIX systems, this is done using a system call that is available in Python by using os.fork. This actually tells the computer to copy everything about the currently running program into a newly created program that is separate, but almost entirely identical. The only difference is that the return value for os.fork is zero in the newly created process (the child), and is the process ID (PID) of the newly created process in the original process (the parent). This can be difficult to understand, and the only way to really get it is to use it a few times and to read some other material on fork and exec that's available online. (Or talk to your nearest UNIX guru.)

Based on the one critical difference, a parent and child can perform different functions. The parent can wait for an event while the child processes, or vice versa. The code to do this is simple and common, but it works only on UNIX and UNIX-like systems:

import os
pid = os.fork()
if pid == 0: # This is the child
    print("this is the child")
else:
    print("the child is pid %d" % pid)

One of the most common things to do after an os.fork call is to call os.execl immediately afterward to run another program. os.execl is an instruction to replace the running program with a new program, so the calling program goes away, and a new program appears in its place (in case you didn't already know this, UNIX systems use the fork and exec method to run all programs):

import os
pid = os.fork()
# fork and exec together
print("second test")
if pid == 0: # This is the child
    print("this is the child")
    print("I'm going to exec another program now")
    os.execl('/bin/cat', 'cat', '/etc/motd')
else:
    print("the child is pid %d" % pid)
    os.wait()

The os.wait function instructs Python that you want the parent to not do anything until the child process returns. It is very useful to know how this works because it works well only under UNIX and UNIX-like platforms such as Linux. Windows also has a mechanism for starting up new processes.

To make the common task of starting a new program easier, Python offers a single family of functions that combines os.fork and os.exec on UNIX-like systems, and enables you to do something similar on Windows platforms. When you want to just start up a new program, you can use the os.spawn family of functions. They are a family because they are named similarly, but each one has slightly different behaviors.

On UNIX-like systems, the os.spawn family contains spawnl, spawnle, spawnlp, spawnlpe, spawnv, spawnve, spawnvp, and spawnvpe. On Windows systems, the spawn family contains only spawnl, spawnle, spawnv, and spawnve.

In each case, the letters after the word spawn mean something specific. The v means that a list (a vector is what the v actually stands for) will be passed in as the parameters. This allows a command to be run with very different commands from one instance to the next without needing to alter the program at all. The l variation just requires a simple list of parameters.

The e occurrences require that a dictionary containing names and values that will be used as the environment for the newly created program will be passed in instead of using the current environment.

The p occurrence uses the value of the PATH key in the environment dictionary to find the program. The p variants are available only on UNIX-like platforms. The least of what this means is that on Windows your programs must have a completely qualified path to be usable by the os.spawn calls, or you have to search the path yourself:

import os, sys
if sys.platform == 'win32':
    print("Running on a windows platform")
    command = "C:\winnt\system32\cmd.exe"
    params = []

if sys.platform == 'linux2':
    print("Running on a Linux system, identified by %s" % sys.platform)
    command = '/bin/uname'
    params = ['uname', '-a']

print("Running %s" % command)
os.spawnv(os.P_WAIT, command, params)

Of course, this example will only work on a limited range of systems. You can use the contents of sys.platform on your own computer and for something besides linux2 in case you are on another UNIX system such as Solaris, Mac OS X, AIX, or others.

When you do this, you can either wait for the process to return (that is, until it finishes and exits) or you can tell Python that you'd prefer to allow the program to run on its own, and that you will confirm that it completed successfully later. This is done with the os.P_ family of values. Depending on which one you set, you will be given a different behavior when an os.spawn function returns.

If you need only the most basic invocation of a new command, sometimes the easiest way to do this is to use the os.system function. If you are running a program and just want to wait for it to finish, you can use this function very simply:

# Now system
if sys.platform == 'win32':
    print("Running on a windows platform")
    command = "cmd.exe"

if sys.platform == 'linux2':
    print("Running Linux")
    command = "uname -a"

os.system(command)

This can be much simpler because it uses the facilities that the operating system provides, and that users expect normally, to search for the program you want to run, and it defaults to waiting for the child process to finish.

Threads — Doing Many Things in the Same Process

Creating a new process using fork or spawn can sometimes be too much effort and not provide enough benefit. Specifically, regarding the too much effort, when a program grows to be large, fork has to copy everything in the program to the new program and the system must have enough resources to handle that. Another downside for fork is that sometimes when you need your program to do many things at the same time, some things may need to wait while others need to proceed. When this happens, you want to have all of the different components communicating their needs to other parts of the program.

Using multiple processes, this becomes very difficult. These processes share many things because the child was originally created using the data in the parent. However, they are separate entities — completely separate. Because of this, it can be very tricky to make two processes work together cooperatively.

So, to make some complex situations where subprocesses are not appropriate workable, the concept of threads is available.

Many cooperative threads of program execution are able to exist at the same time in the same program. Each one has potentially different objects, with different state, but they can all communicate, while also being able to run semi-independently of one another.

This means that in many situations, using threads is much more convenient than using a separate process. Note that the following example uses subclassing, which is covered in Chapter 10. To see how this works, try running it with a fairly large parameter, say two million (2000000):

import math
from threading import Thread
import time

class SquareRootCalculator:
"""This class spawns a separate thread to calculate a bunch of square
    roots, and checks in it once a second until it finishes."""

    def __init__(self, target):
        """Turn on the calculator thread and, while waiting for it to
        finish, periodically monitor its progress."""
        self.results = []
        counter = self.CalculatorThread(self, target)
        print("Turning on the calculator thread...")
        counter.start()
        while len(self.results) < target:
            print("%d square roots calculated so far." % len(self.results))
            time.sleep(1)
        print("Calculated %s square root(s); the last one is sqrt(%d)=%f" %
              (target, len(self.results), self.results[−1]))

    class CalculatorThread(Thread):
        """A separate thread which actually does the calculations."""

        def __init__(self, controller, target):
            """Set up this thread, including making it a daemon thread
            so that the script can end without waiting for this thread to
            finish."""
            Thread.__init__(self)
            self.controller = controller
            self.target = target
            self.setDaemon(True)

        def run(self):
            """Calculate square roots for all numbers between 1 and the target,
            inclusive."""
            for i in range(1, self.target+1):
               self.controller.results.append(math.sqrt(i))

if __name__ == '__main__':
    import sys
    limit = None
    if len(sys.argv) > 1:
        limit = sys.argv[1]
        try:
            limit = int(limit)
        except ValueError:
            print("Usage: %s [number of square roots to calculate]"
                  % sys.argv[0])
    SquareRootCalculator(limit)

For many situations, such as network servers (see Chapter 16) or graphical user interfaces (see Chapter 13), threads make much more sense because they require less work from you as the programmer, and fewer resources from the system.

Note how separate threads can access each other's names and data easily. This makes it very easy to keep track of what different threads are doing, which is an important convenience.

Summary

In this chapter, you were introduced to some of the many available functions and modules that Python offers. These features build on the material you've already learned and most of them are expanded on in the remaining chapters in the book.

You learned how to use some basic features that enable what is usually called a functional style of programming, which in Python is offered through the functions lambda and map. Lambda enables you to write a simple function without having to declare it elsewhere. These functions are called anonymous because they can be written and run without ever having to be bound to a name. Map operates on lists, and when used on a simple list will run a function on each element from beginning to end. It has some more complex behaviors, too, which occur when lists within lists, or more than one list, is provided to map.

The key things to take away from this chapter are:

  • List comprehension is the capability to run a limited amount of code — a simple loop, for instance — within the square brackets that dereference a sequence, so that only those elements that meet the criteria within the brackets will be returned. This enables you to easily and quickly access specific members of a sequence.

  • The range operation enables you to generate iterators that are commonly used in for loops because they can provide you with numeric lists starting at any number, and ending at any number.

  • In addition to simple string substitution, you can provide a string with format specifiers that reference the name of keys in dictionaries by using a special syntax. This form enables you to continue to use the format specifier options, such as how many spaces you want reserved for the substitution or how many decimal points should be used.

  • An alternative form for simple key-name-based string formatting is provided in the string.Template module that was added to Python 2.4. It provides a slightly simpler format that is more appropriate (or at least easier to explain) when you allow your users to specify templates. Generating form letters is one example of how this could be used.

  • Getopt enables you to specify options on the command line that lets you offer your users options that determine the behavior of your programs when they're run.

  • You now know how to create more processes when needed, and how to create threads for use in more complex programs that need to do many things in parallel. You learn more about using threads in Chapters 13 and 16.

  • The features and modules presented here give you an idea of the different directions in which Python can be extended and used, and how easy it is to use these extensions. In Chapter 10, you see most of the concepts you've used already tied into an example working program.

Exercises

Chapter 9 is a grab-bag of different features. At this point, the best exercise is to test all of the sample code, looking at the output produced and trying to picture how the various ideas introduced here could be used to solve problems that you'd like to solve or would have liked to solve in the past.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.167.195