© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2022
A. DanialPython for MATLAB Developmenthttps://doi.org/10.1007/978-1-4842-7223-7_9

9. Interacting with the Operating System and External Executables

Albert Danial1  
(1)
Redondo Beach, CA, USA
 

Complex workflows rarely involve just MATLAB or Python in isolation. Other applications, whether custom in-house tools, commercial simulation and modeling packages, or open source tools, play important roles in the grander scheme of things. In this chapter, we’ll see how Python programs can interact with the underlying operating system and other executables. We’ll read and change environment variables, call other programs and capture their output, monitor the host computer’s memory and CPU use, and kill processes that exceed given resource thresholds.

9.1 Reading, Setting Environment Variables

Environment variables can be accessed through the os.environ object in Python and the getenv() function in MATLAB:

MATLAB:

Python:

>> getenv('HOME')

ans = /home/al

In : import os

In : os.environ['HOME']

Out: '/home/al'

os.environ has an advantage over getenv() because it behaves like a dictionary. This allows you to iterate over all known environment variables (recall from Section 4.​2.​3 that the format option <10s means “a string, 10 characters wide, left justified”):

Python:
In : for V in sorted(os.environ):
...:     print(f'{V:<10s} = {os.environ[V]}')
AUTOJUMP_ERROR_PATH = /home/al/.local/share/autojump/errors.log
AUTOJUMP_SOURCED = 1
CLUTTER_IM_MODULE = xim
COMPIZ_BIN_PATH = /usr/bin/
COMPIZ_CONFIG_PROFILE = ubuntu
CONDA_DEFAULT_ENV = base
CONDA_EXE  = /usr/local/anaconda3/bin/conda
CONDA_PREFIX = /usr/local/anaconda3
         :

MATLAB makes this much harder. The most common approach suggested on Stack Overflow is to parse the output of a system call to env on Linux and macOS and set on Windows. This is done in Section 9.2 where system calls in both languages are described.

Environment variables can be set programmatically as well, although changes made by a program persist only for the life of the program’s process. When the program ends, the environment variables in the session where the program was run remain unchanged.

In Python, one can assign an environment variable merely by setting os.environ as one would a dictionary key. MATLAB sets variables with its setenv() function .

MATLAB:

Python:

>> setenv('N_CASES', '42')

>> getenv('N_CASES')

ans = 42

In : import os

In : os.environ['N_CASES'] = '42'

In : os.environ['N_CASES']

Out: '42'

One quirk with both languages is that only string values are allowed. MATLAB will accept an integer and automatically typecast it to a string, but the resulting string will likely be unintended; the integer 42 becomes its ASCII character equivalent of “*”. Python throws a TypeError error :

MATLAB:

Python:

>> setenv('A', 42)

warning: implicit conversion

   from scalar to sq_string

>> getenv('A')

ans = *

In : import os

In : os.environ['N_CASES'] = 42

Out: TypeError: str expected, not int

9.2 Calling External Executables

External system calls are made in MATLAB with the system() function and in Python with functions in the subprocess module : subprocess.run() or subprocess.Popen(). The .run() function most closely resembles MATLAB’s system(), while .Popen() allows one to orchestrate the execution of an entire chain of applications, piping output from one program to the input of the next.

Although subprocess.run() resembles system(), their methods of operation differ starkly. MATLAB’s system() has only 1 optional argument ('-echo', to additionally show the command’s output in the Command Window; useful for interacting with the external command), while Python’s subprocess.run() has 14 optional arguments. The most commonly used of these—check, capture output, shell, timeout—are illustrated by the following example.

In Section 9.1, we saw that MATLAB would need to call the operating system’s env (Linux, macOS) or set (Windows) command to get a list of all environment variables and their values. Although Python has direct access through these via the os.environ object , we’ll perform the same task with a system call.

MATLAB:
>> [Status, Result] = system('env');
Python:
In : import subprocess
In : Result = subprocess.run(['env'], capture_output=True)

MATLAB’s return variables, Status and Result, are a double (the value 0 means the command ran successfully; a non-zero value indicates an error) and a character array containing the command’s entire STDOUT stream. Individual lines of output can be iterated over by splitting Result on newlines via strsplit(Result, 'n').

Python’s Result is an object whose attributes include .returncode, equivalent to Status in MATLAB; .stdout, equivalent to Result in MATLAB; and .stderr which contains error messages the command generated, if any. Curiously, MATLAB’s system() provides no mechanism to capture STDERR.

We’ll continue the example by iterating over lines of output from the env system call and extracting the environment variable name and value.

MATLAB:
[Status, Result] = system('env');
lines = strsplit(Result, ' ');   % separate lines from Result
for i = 1:len(lines)
    X = strsplit(lines{i,1}, '=');
    var = X{1,1};
    val = X{1,2};
    fprintf('%-10s : %s ', var, val);
end
Python:
import subprocess
Result = subprocess.run(['env'], capture_output=True)
lines = Result.stdout.decode() # turn character array to string
for L in lines.split(' '):
    var, val = L.split('=')
    print(f'{var:<10s} : {val}')

9.2.1 Checking for Failures

External executables may fail for many reasons: command not found, illegal arguments, missing or malformed inputs, processing errors, insufficient privilege to write to the output location, and so on. These are not Python errors, so, by default, if the command given to subprocess.run() fails, the function returns and Python program continues to run. Result.returncode will be non-zero, so we’ll know that the command failed, but we won’t know the reason for the failure.

This behavior can be changed by setting the optional keyword variable check to True. This causes a failure by the external executable to propagate to the Python program as CalledProcessError exception.

To explore failure behavior, we’ll use ffmpeg , a powerful audio and video manipulation program, to convert an MPEG4 video file into the more highly compressed WebM format. As an input file, we can use the MPEG4 file created by Jake VanderPlas showing the chaotic motion of a triple pendulum;1 the file can be downloaded from http://jakevdp.github.io/videos/triple-pendulum.mp4.

The nominal ffmpeg command to convert the file is
  ffmpeg -loglevel quiet -i triple-pendulum.mp4 triple-pendulum.webm

If the ffmpeg executable is in the environment’s search path, and if the input file can be read, both Python and MATLAB should have no issue invoking the command and producing the WebM file.

What we want to study is not the successful case but the failure. To trigger the failure, we’ll misspell the log level setting quiet as quiett.

In the following Python code, we’ll enable the optional check option to the arguments of subsystem.run() in Python.

MATLAB:
>> command = "ffmpeg -loglevel quiett -i triple-pendulum.mp4 triple-pendulum.webm";
>> [Status, Result] = system(command);
Python:
import subprocess
command = ['ffmpeg', '-loglevel', 'quiett', '-i',
           'triple-pendulum.mp4', 'triple-pendulum.webm']
Result = subprocess.run(command, check=True)

The errors from MATLAB and Python look like this:

MATLAB:
>> [Status, Result] = system(command);
>> Status
     1
>> Result
    '[4;31mInvalid loglevel "quiett". Possible levels are numbers or:
     [0m[4;31m"quiet"
     [0m[4;31m"panic"
     [0m[4;31m"fatal"
     [0m[4;31m"error"
     [0m[4;31m"warning"
     [0m[4;31m"info"
     [0m[4;31m"verbose"
     [0m[4;31m"debug"
     [0m[4;31m"trace"
     [0m'
Python:
In : Result = subprocess.run(command, check=True)
Invalid loglevel "quiett". Possible levels are numbers or:
"quiet"
"panic"
"fatal"
"error"
"warning"
"info"
"verbose"
"debug"
"trace"
Traceback (most recent call last):
  [...]
CalledProcessError: Command '['ffmpeg', '-loglevel', 'quiett',
    '-i', 'triple-pendulum.mp4', 'triple-pendulum.webm']'
    returned non-zero exit status 1.

Handling errors from system calls in MATLAB means checking for a non-zero error status and in Python calling subprocess.run() with check=True and catching subprocess.CalledProcessError errors:

MATLAB:
[Status, Result] = system(command);
if Status
    fprintf('Command failure with %s ', command)
end
Python:
try:
    Result = subprocess.run(command, check=True)
except subprocess.CalledProcessError as err:
    print(f'Command failure with {command}: {err}')

9.2.2 A Bytes-Like Object Is Required

The stdout attribute from the return value Result is a byte array rather than a string. If you try to apply a string operation like .split(), Python will raise a TypeError:

Python:
In : import subprocess
In : x = subprocess.run('dir', capture_output=True)
In : x.stdout.split(' ')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-24-f1228a95ce2f> in <module>
----> 1 x.stdout.split(' ')
TypeError: a bytes-like object is required, not 'str'

The solution is to cast the byte array to a string:

Python:
In : import subprocess
In : x = subprocess.run('dir', capture_output=True)
In : x.stdout.split(' ')
In : str(x.stdout).split(' ')
Out: ["b'data.h5\t\t\file_1.txt\n'"]

9.3 Inspecting the Process Table and Process Resources

Numerical analyses and simulations tend to be resource-hungry. It helps to keep an eye on CPU, memory, and file system use to characterize a program’s needs before launching a multiday run. Some of this characterization can be done by profiling code (Section 14.​5). However, profiling does not answer questions such as “how much memory/CPU/network am I using right now?”, “are other users putting a significant load on my machine?”, or “how much disk space is available in the test data directory?”

MATLAB falls short when it comes to querying a computer’s processes and the resources they consume. Making such queries involves calls to external utilities provided by the underlying operating system—the Process Explorer on Windows, Activity Monitor on macOS, or ps and top on Linux or macOS.

Python, on the other hand, has a module, psutil,2 which provides an operating system–independent method for examining processes and the resources they use, as well as the computer’s hardware including CPU load, CPU temperature, memory, network interfaces, storage, battery level, fan speed, and so on. In addition to inspecting processes, psutil can suspend, resume, and terminate them (provided the process belongs to the user issuing these commands).

Occasionally, I underestimate the amount of memory a computation needs (easy to do when creating, then computing eigensolutions of large sparse matrices as with the finite element benchmarks in Section 14.​2.​2), and my computer grinds to a standstill as the operating system spends all its time swapping memory to disk. The power button is my only recourse—very annoying. This problem can happen with any memory-hungry application in any language; MATLAB and Python are not special in this regard.

psutil makes it easy to write a job shepherd that kills any process that stresses the computer excessively. I run the following job shepherd in a separate terminal when I’m working on computations that push the limits of my laptop’s hardware. If a run goes off the rails by consuming excessive memory and CPU cycles, the shepherd kills it, sparing me from a time-wasting and possibly file system–damaging hard reboot.3 Some notes on how it works:
  • Line 7: max_L1 is the one-minute load average below which the program does nothing. A load of 1.0 means one core on the machine is fully loaded.

  • Line 8: Similarly, if the machine has less than max_mem_fraction of its memory in use, the program does nothing.

  • Line 17: The program runs in a continuous loop, sleeping refresh_sec seconds after every iteration.

  • Line 28: If the one-minute load average and memory fraction limits are exceeded, the program iterates over all processes:
    • Line 29: If the process name is the ignore set, it goes on to the next one.

    • Line 31: If the process does not belong to the person running the shepherd, it is skipped. psutil cannot kill processes owned by other users.

    • Line 36: The process’s CPU load is measured over a 0.2-second interval. If the value is less than min_cpu_pct, the loop proceeds to the next process.

      Processes may end before the measurement interval elapses, so this step may fail; an exception handler prevents the program from ending with an error.

    • Line 44: The process’s memory is measured. This may fail even if we own the process, so the measurement is wrapped in another exception catcher.

    • Line 50: Finally, if the process uses more than half of the memory on the machine or its memory is being swapped to disk, the process is killed.

Python:
 1   # file: code/os/job_shepherd.py
 2   import os
 3   import psutil
 4   import time
 5
 6   refresh_sec = 1.0
 7   max_L1  = 1.5  # 1 minute load average
 8   max_mem_fraction = 0.5
 9   min_cpu_pct = 50.0
10
11   ignore = {
12       'chrome', 'dbus-daemon', 'dconf-service', 'firefox',
13       'gnome-terminal-server', 'ssh-agent', 'systemd', 'vim', 'Xorg', 'top'}
14
15   uid_me  = psutil.Process( os.getpid() ).uids().real  # my user ID #
16
17   while True:
18       L1, L5, L15 = psutil.getloadavg()
19       mem = psutil.virtual_memory()
20       if L1 < max_L1:
21           time.sleep(refresh_sec)
22           continue
23       # under heavy system load
24       if mem.used/mem.total < max_mem_fraction:
25           time.sleep(refresh_sec)
26           continue
27       # under heavy system and memory load
28       for proc in psutil.process_iter(['pid', 'name', 'uids']):
29           if proc.name() in ignore:
30               continue
31           if uid_me != proc.uids().real:
32               # process doesn't belong to me, can't do anything about it
33               continue
34           info = psutil.Process(proc.pid)
35           try:
36               cpu_pct = info.cpu_percent(interval=0.2)
37           except:
38               # process ended before cpu measurement finished
39               continue
40           if cpu_pct < min_cpu_pct:
41               # not doing anything, ignore
42               continue
43           try:
44               pmem = info.memory_full_info()
45           except psutil.AccessDenied:
46               # parent must own it, can't control this
47               continue
48           print(f'pid={proc.pid} name={proc.name()} CPU={cpu_pct} '
49                 f'pmem={pmem.rss} swap={pmem.swap}')
50           if pmem.rss/mem.total > 0.5 or pmem.swap > 0:
51               print(f'-> kill {proc.pid} name={proc.name()}')
52               proc.kill()
53       time.sleep(refresh_sec)

If you’re curious to try it out, here’s a small program that will eventually bring your computer to its knees. It makes an increasingly larger square matrix of random numbers and then multiplies it by a random vector. Raise the increment on N if the memory consumption rate is too slow. Also, modify the job shepherd’s constants—especially the program names in the ignore set—to find the right balance of measurements to kill runaway processes on your computer.

Python:
#!/usr/bin/env python3
# code/os/machine_buster.py
import numpy as np
N = 200
while True:
    print(f'N = {N}')
    a = np.random.rand(N,N)
    b = np.random.rand(N)
    a.dot(b)
    N += 200
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.134.99.32