Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

A. DanialPython for MATLAB Developmenthttps://doi.org/10.1007/978-1-4842-7223-7_4

4. Data Containers

Albert Danial¹

(1)

Redondo Beach, CA, USA

MATLAB and Python have many powerful data containers. MATLAB’s primary containers are the matrix, cell array, and struct, while Python mainstays are lists, dictionaries, and, for the intended audience of this book, NumPy arrays. A fourth key container in the Python landscape is the Pandas dataframe which corresponds most closely to a MATLAB table. Tables and dataframes are covered in Chapter 13. Approximate equivalences of MATLAB and Python containers appear in Table 4-1.

Similarities and differences between MATLAB and Python data containers are worth studying because knowledge of how the containers work is essential for the design and implementation of efficient algorithms. The next sections cover the most important containers in both languages.

Table 4-1

Data containers in MATLAB and Python

MATLAB	Python Equivalent	Section
Matrices	NumPy ndarrays (or just “arrays”)	4.1, 11
Cell arrays	Lists	4.3
Structs	Dictionaries; attributes; data classes	4.6 4.7
Tables	Pandas dataframes	13

NumPy arrays are covered thoroughly in Chapter 11 and appear frequently in most other chapters. Here, we merely give a cursory introduction to the NumPy module and its primary container, the n-dimensional array, also known as the ndarray.

4.1 NumPy Arrays

Python, accompanied by the core modules in the standard distribution, has little to offer for numeric computation. For this, we need NumPy, the numerics module at the heart of nearly every Python-powered scientific and numeric capability. NumPy arrays are covered thoroughly in Section 11.1; here, we just introduce them with a sneak preview.

The standard way to import NumPy is with

import numpy as np

Subsequently, all NumPy-related functions can be invoked by prefixing them with np. One may be tempted to use the “import everything” option with

from numpy import *

but, as mentioned in Section 3.14.1.4, this could break your code in subtle ways.

Here’s a basic example to give an indication of how a simple linear algebra problem looks in NumPy compared to MATLAB:

MATLAB:	Python:
>> A = [1 2; 3 4] A = 1 2 3 4 >> b = ones(2,1) b = 1 1 >> x = A x = -1 1	In : A = np.array([[1,2],[3,4]]) In : A array([[1,2], [3,4]]) In : b = np.ones((2,)) In : b Out: array([1.,1.]) In : x = np.linalg.solve(A,b) In : x Out: array([-1.,1.])

MATLAB:

Python:

>> A = [1 2; 3 4]

A =

1 2

3 4

>> b = ones(2,1)

b =

>> x = A

x =

-1

In : A = np.array([[1,2],[3,4]])

In : A

array([[1,2],

[3,4]])

In : b = np.ones((2,))

In : b

Out: array([1.,1.])

In : x = np.linalg.solve(A,b)

In : x

Out: array([-1.,1.])

Both MATLAB and ipython have a whos command that lists the names, types, and sizes of variables:

MATLAB

>> whos

Variables in the current scope:

Attr Name Size Bytes Class

==== ==== ==== ===== =====

A 2x2 32 double

B 2x1 16 double

x 2x1 16 double

Total is 8 elements using 64 bytes

Python

In : whos

Variable Type Data/Info

-------------------------------

A ndarray 2x2: 4 elems, type `int64`, 32 bytes

b ndarray 2: 2 elems, type `float64`, 16 bytes

x ndarray 2: 2 elems, type `float64`, 16 bytes

Note that A in Python is an integer array. The np.array() function infers a data type from the values passed in, all of which are integers. The np.ones() function on the other hand defaults to creating double-precision floats. If instead we want floating-point values, we’ll need to change at least one of them to look like a floating-point value:

In : a = np.array([[1.,2],[3,4]])

In : whos

Variable Type Data/Info

--------------------------

a ndarray 2x2: 4 elems, type `float64`, 32 bytes

In : a

Out:

array([[1., 2.],

[3., 4.]])

Alternatively, we can explicitly tell array() what the data type should be with the optional dtype argument . This example creates a quadruple-precision complex¹ array:

In : a = np.array([[1,2],[3,4]], dtype=np.complex256)

In : a

Out:

array([[1.+0.j, 2.+0.j],

[3.+0.j, 4.+0.j]], dtype=complex256)

The bulk of this book will be about NumPy arrays and operations on them. For now, though, we’ll turn our attention to other data containers.

4.2 Strings

Both Python and MATLAB have extensive support for creating, modifying, and testing strings. Both also have a different data type for a byte array which is often considered the same as a string. They differ though: a byte array (or character vector in MATLAB) is a sequence of bytes, while a string is a sequence of text characters—and a text character may require several bytes to represent it. An ASCII string maps one to one with a byte array, but a Unicode string storing, say, Vietnamese characters, will not.

4.2.1 Strings, Character Arrays, and Byte Arrays

While both languages have similar concepts of strings, they differ in how and what their more simpler forms, character arrays (MATLAB) and byte arrays (Python), contain. MATLAB char arrays are essentially primitive strings where each character is stored in two bytes. Python byte arrays are a collection of uint8 values that can look like strings if the byte values fall in the range of printable ASCII characters. A MATLAB string is actually a 1x1 cell array containing a 1xN array of chars. You’ll occasionally see MATLAB strings indexed by {1} since this construct returns the underlying char array---handy for numeric indexing of substrings.

The following examples show how strings, char, and byte arrays are created. MATLAB uses single quotes to denote a character array and double quotes to denote a string. In Python, single and double quotes are interchangeable (but of course must be paired properly). It uses the special notation of b" to denote a byte array.

MATLAB:	Python:
>> x = 'byte array'; >> y = "a string"; >> class(x) 'char' >> class(y) 'string'	In : x = b'byte array' In : y = 'a string' In : type(x) Out: bytes In : type(y) Out: str

MATLAB:

Python:

>> x = 'byte array';

>> y = "a string";

>> class(x)

'char'

>> class(y)

'string'

In : x = b'byte array'

In : y = 'a string'

In : type(x)

Out: bytes

In : type(y)

Out: str

Both languages have functions or methods to convert between the types:

MATLAB:	Python:
>> x = 'byte array'; >> y = "a string"; >> x_str = string(x); >> b_arr = char(y);	In : x = b'byte array' In : y = 'a string' In : x_str = x.decode() In : b_arr = y.encode()

MATLAB:

Python:

>> x = 'byte array';

>> y = "a string";

>> x_str = string(x);

>> b_arr = char(y);

In : x = b'byte array'

In : y = 'a string'

In : x_str = x.decode()

In : b_arr = y.encode()

Numeric values of the individual characters can be found by casting to a numeric type, say uint16 or double in MATLAB, or iterating through the bytes in Python:

MATLAB:	Python:
>> x = 'byte array'; >> uint16(x) 98 121 116 101 32 97 114 114 97 121	In : x = b'byte array' In : [_ for _ in x] Out: [98, 121, 116, 101, 32, 97, 114, 114, 97, 121]

MATLAB:

Python:

>> x = 'byte array';

>> uint16(x)

98 121 116 101 32

97 114 114 97 121

In : x = b'byte array'

In : [_ for _ in x]

Out: [98, 121, 116, 101, 32,

97, 114, 114, 97, 121]

A commonly seen error in Python is TypeError: a bytes-like object is required, not 'str'. This happens when attempting to perform a string operation, covered in the next section, using a byte array. The simple fix is to apply the .decode() method on the byte array to turn it into a string.

4.2.2 String Operations

Python strings, like all data containers and functions in Python, are objects. Their attributes and methods can be queried interactively in ipython by adding a period after a string variable, then hitting the <TAB> key :

Python:

In : a = 'abc'

In : a.<TAB>

In : a.

capitalize() isalpha() ljust() split()

casefold() isascii() lower() splitlines()

center() isdecimal() lstrip() startswith()

count() isdigit() maketrans() strip()

encode() isidentifier() partition() swapcase()

endswith() islower() replace() title()

expandtabs() isnumeric() rfind() translate()

find() isprintable() rindex() upper()

format() isspace() rjust() zfill()

format_map() istitle() rpartition()

index() isupper() rsplit()

isalnum() join() rstrip()

MATLAB has a similar capability with its methods and methodsview commands, but these are applied to the data type (or “class” in MATLAB terminology) rather than to a variable:

MATLAB:

>> a = "abc";

>> class(a)

'string'

>> methods string

Methods for class string:

append endsWith extractBetween join pad splitlines

cellstr eq ge le plus startsWith

char erase gt lower replace strip

compose eraseBetween insertAfter lt replaceBetween strlength

contains extract insertBefore matches reverse upper

count extractAfter ismissing ne sort

double extractBefore issorted or split

Common string operations are described in greater detail in the following sections.

4.2.2.1 String Length

As mentioned above, a MATLAB string is a 1x1 cell array containing a 1xN array of char. Applying length() on a string just returns the number of columns of the cell array---which is always 1. To get the string length one must use the special string function strlength() or call length() on the underlying char array obtained by indexing the string with {1}:

MATLAB:	Python:
>> str = "string length"; >>length(str{1}) ans = 13	In : str = "string length" In : len(str) Out: 13

MATLAB:

Python:

>> str = "string length";

>>length(str{1})

ans = 13

In : str = "string length"

In : len(str)

Out: 13

4.2.2.2 Append to a String

Use the addition operator, +, to append strings together in MATLAB and Python:

MATLAB:	Python:
>> A = "cats"; >> B = "dogs"; >> C = A +" "+ B; >> C "cats dogs"	In : A = "cats" In : B = "dogs" In : C = A +" "+ B In : C Out: 'cats dogs'

MATLAB:

Python:

>> A = "cats";

>> B = "dogs";

>> C = A +" "+ B;

>> C

"cats dogs"

In : A = "cats"

In : B = "dogs"

In : C = A +" "+ B

In : C

Out: 'cats dogs'

Remember to use double quotes for string literals in MATLAB. Character arrays (single quoted strings) will not give the expected result:

MATLAB:	Python:
>> 'a'+'b' 195	In : 'a'+'b' Out: 'ab'

MATLAB:

Python:

>> 'a'+'b'

195

In : 'a'+'b'

Out: 'ab'

4.2.2.3 Repeat a String

The multiplication operator, *, allows one to make multiple copies of a string in Python. Only nonnegative integer values may be used as multipliers.

MATLAB:	Python:
>> A = "#."; >> repmat(A,1,5) ans = '#.#.#.#.#.'	In : A = "#." In : A * 5 Out: '#.#.#.#.#.'

MATLAB:

Python:

>> A = "#.";

>> repmat(A,1,5)

ans = '#.#.#.#.#.'

In : A = "#."

In : A * 5

Out: '#.#.#.#.#.'

4.2.2.4 Convert to Upper- or Lowercase

Case conversion uses the identically named functions in both languages. Both also return new strings without modifying the original.

MATLAB:	Python:
>> A = "The String"; >> upper(A) ans = 'THE STRING' >> lower(A) ans = 'the string'	In : A = "The String" In : A.upper() Out: 'THE STRING' In : A.lower() Out: 'the string'

MATLAB:

Python:

>> A = "The String";

>> upper(A)

ans = 'THE STRING'

>> lower(A)

ans = 'the string'

In : A = "The String"

In : A.upper()

Out: 'THE STRING'

In : A.lower()

Out: 'the string'

4.2.2.5 Replace Characters

As with .upper() and .lower(), the .replace() method in Python and replace() function in MATLAB return a new string; they do not alter the contents of variable being worked on. To modify A, this variable must be explicitly assigned to the return value from the replace function call:

MATLAB:	Python:
>> A = "the fox box"; >> replace(A,'ox','it') " "the fit bit" >> A "the fox box" >> A = replace(A,'ox', 'it') "the fit bit"	In : A = "the fox box" In : A.replace('ox', 'it') Out: 'the fit bit' In : A Out: 'the fox box' In : A = A.replace('ox', 'it') In : A Out: 'the fit bit'

MATLAB:

Python:

>> A = "the fox box";

>> replace(A,'ox','it') "

"the fit bit"

>> A

"the fox box"

>> A = replace(A,'ox', 'it')

"the fit bit"

In : A = "the fox box"

In : A.replace('ox', 'it')

Out: 'the fit bit'

In : A

Out: 'the fox box'

In : A = A.replace('ox', 'it')

In : A

Out: 'the fit bit'

4.2.2.6 Method Chaining

Parsing text often requires multiple clean-up operations: remove commas, replace unwanted text with spaces, convert everything to lowercase, and so on. Multiple operations can be chained in Python where one method call is immediately followed by another.

Say we want to extract time and x,y coordinates from a log file containing other text we’re not interested in. A line of input might look like this:

1588350589.176445772 x: 36.67, y: -67.3

There are several ways to extract the numbers (notably with regular expressions which will be covered in Section 4.2.6); one way is to simply replace “x:”, “y:”, and “,” with spaces or empty strings. The equivalent of Python’s method chaining is a cumbersome collection of nested function calls in MATLAB:

MATLAB:

>> line = "1588350589.176445772 x: 36.67, y: -67.3";

>> line = split(replace(replace(replace(line,'x:',''),'y:',''),',',' '))

3×1 string array

"1588350589.176445772"

"36.67"

"-67.3"

Python :

In : line = '1588350589.176445772 x: 36.67, y: -67.3'

In : line = line.replace('x:','').replace('y:','').replace(',',' ').split()

Out: ['1588350589.176445772','36.67','-67.3']

The .split() method is described in Section 4.2.4.

4.2.3 Formatting

Both MATLAB and Python can use C language–style formatting for strings:

MATLAB:

>> str = sprintf('[%-5s] [%6.3f] [%02d]','hi',pi,4);

>> str

'[hi] [3.142] [04]'

Python:

In : str = '[%-5s] [%6.3f] [%02d]'%('hi', np.pi,4)

In : str

Out: '[hi] [3.142] [04]'

Python additionally supports a convenient feature known as f-strings that allow variables and expressions to be embedded within the format string instead of appearing afterward as arguments. Note the f" prefix on the format string:

Python:

In : h, q = 'hi', 4

In : str = f'[{h:<5s}] [{np.pi:6.3f}] [{q:02d}]'

In : str

Out: '[hi] [3.142] [04]'

The < symbol on the string format <5s means “left justify.” The formatting designations are optional. Without them, the output is

Python:

In : h, q = 'hi', 4

In : str = f'[{h}] [{np.pi}] [{q}]'

In : str

Out: '[hi] [3.141592653589793] [4]'

4.2.4 Separate a String into Words

A frequently performed operation for reading input data is splitting a string into an array of words delimited by whitespace, commas, or other characters. Python’s .split() method, like MATLAB’s strsplit() function, will default to splitting the string on whitespace; passing an argument will split on that character or substring.

Outputs from split operations are a cell array in MATLAB and a list in Python. These containers will be described in detail in Section 4.3.

MATLAB:

>> str = "Nature's first green is gold";

>> strsplit(str)

1×5 string array

"Nature's" "first" "green" "is" "gold"

>> strsplit(str, 'e')

1×3 string array

"Natur" "'s first gr" "n is gold"

Python:

In : str = "Nature's first green is gold"

In : str.split()

Out: ["Nature's", 'first', 'green', 'is', 'gold']

In : str.split('e')

Out: ['Natur', "'s first gr", '', 'n is gold']

Splitting comma-separated value (.csv) files is such a ubiquitous task that both MATLAB and Python have special methods for this. Chapter 7 has an extensive section on working with .csv files.

4.2.5 Tests on Strings

When working with numeric data, MATLAB’s notation is generally terser than Python’s. The reverse is true for strings.

4.2.5.1 Testing for Equality

MATLAB:	Python:
>> str = "string equality"; >> strcmp(str{1}(1:6), "string") ans = 1 >> str{1}(end-1:end) ans = ty >> strcmp(str{1}(end-1:end),"tY") ans = 0	In : str = "string equality" In : str[:6] == "string" Out: True In : str[-2:] Out: 'ty' In : str[-2:] == "tY" Out: False

MATLAB:

Python:

>> str = "string equality";

>> strcmp(str{1}(1:6), "string")

ans = 1

>> str{1}(end-1:end)

ans = ty

>> strcmp(str{1}(end-1:end),"tY")

ans = 0

In : str = "string equality"

In : str[:6] == "string"

Out: True

In : str[-2:]

Out: 'ty'

In : str[-2:] == "tY"

Out: False

4.2.5.2 Check Trailing Characters

Say you want to grab the names of the .csv files in a directory (covered in Section 8.1). Simple, all you need to do is check that the last four characters in each file name are “.csv”, right? That’s true—so long as the string has four characters to check. Conveniently, both languages support methods to check if strings start and end with a given string:

MATLAB:	Python:
>> fname = "a.csv"; >> endsWith(fname,".csv") 1	In : fname = "a.csv"; In : fname.endswith('.csv') Out: True

MATLAB:

Python:

>> fname = "a.csv";

>> endsWith(fname,".csv")

In : fname = "a.csv";

In : fname.endswith('.csv')

Out: True

4.2.5.3 Check Starting Characters

MATLAB:	Python:
>> fname = "a.csv"; >> startsWith(fname,"a") 1	In : fname = "a.csv" In : fname.startswith('a.') Out: True

MATLAB:

Python:

>> fname = "a.csv";

>> startsWith(fname,"a")

In : fname = "a.csv"

In : fname.startswith('a.')

Out: True

4.2.5.4 Do Given Characters Appear in a String?

Python’s in operator lets us test for the presence of a substring within a string, similar to contains() in MATLAB:

MATLAB:	Python:
>> str = 'cat or dog or bird'; >> contains(str,'or') ans = 1 >> contains(str,'and') ans = 0	In : str = 'cat or dog or bird' In : 'or' in str Out: True In : 'and' in str Out: False

MATLAB:

Python:

>> str = 'cat or dog or bird';

>> contains(str,'or')

ans = 1

>> contains(str,'and')

ans = 0

In : str = 'cat or dog or bird'

In : 'or' in str

Out: True

In : 'and' in str

Out: False

4.2.6 String Searching, Replacing with Regular Expressions

A regular expression, or regex, works with string patterns and is used for three purposes:

1.
Check whether or not text has the desired pattern
2.
Extract text patterns from a string (if they exist) for subsequent use
3.
Replace text that matches a pattern with new text

Python has a complete Perl-compatible regex engine, while MATLAB implements only a subset of the Perl regex metacharacters. The underlying mechanisms of invoking the pattern search and extracting results also differ.

4.2.6.1 Does a String Match a Regex?

In the first example, we use a regex to see whether or not a string contains two integers separated by spaces, then either “dog” or “cat”:

MATLAB:

>> Y = "7x U 12 14 cat?";

>> out = regexp(Y,"s(d+s+){2}(cat|dog)");

>> if out ; fprintf('matched '); end matched

>> N = "7x U 12 14 mouse!";

>> out = regexp(N,"s(d+s+){2}(cat|dog)");

>> if out ; fprintf('matched '); end

Python:

In : import re

In : Y = "7x U 12 14 cat?"

In : out = re.search(r"s(d+s+) {2}(cat|dog)", Y)

In : if out: print("matched!")

matched!

In : N = "7x U 12 14 mouse!";

In : out = re.search(r"s(d+s+) {2}(cat|dog)", N)

In : if out: print("matched!")

In :

4.2.6.2 Match a Regex and Capture Substrings

Portions of a regular expression can be captured for subsequent use by wrapping the portion of interest with parentheses. The 'token' argument to MATLAB’s regexp() function returns the matched portions as a cell array of strings. In Python, the object returned by re.search() has a .group() method which returns the matched portions and a .groups() method which returns a tuple of all of these matches.

A significant difference between MATLAB and Python is that Python can return results from nested captures—that is, from nested parenthetical expressions—but MATLAB can’t. In the following example, we’ll look for numeric year-month-day patterns in an input:

MATLAB:

>> str = "1door 1. 2019-03-04 14.55 L22-";

>> m = regexp(str,'(d{4}(-dd){2})','tokens')

1×1 cell array

{["2019-03-04"]}

Python:

In : import re

In : str = "1door 1. 2019-03-04 14.55 L22-"

In : m = re.search(r'(d {4}(-dd){2})',str)

In : m.group(1)

2019-03-04

In : m.group(2)

-04

The inner parentheses, qualified by a “2x” multiplier, hold the last match of the multiplied pattern, therefore just -04 instead of -03-04.

4.2.6.3 Replace Text Matching a Regex with Different Text

In this example, we’ll replace either “cat” or “dog” with “fish” followed by a copy of the integer preceding it. The notation g<1> in the Python regular expression is a backreference to the first grouped pattern, that is, the contents of the regular expression caught in the first pair of parentheses.

Backreferences are not supported in MATLAB 2020b, so an additional step is needed to first capture the integer before “cat” or “dog”:

MATLAB:

>> Y = "7x U 12 14 cat?";

>> m = regexp(Y,"(d+)s+(cat|dog)",'tokens'); % m{1}{1} = '14'

>> regexprep(Y,"d+s+(cat|dog)",sprintf('%s fish %s', m{1}{1}, m{1}{1}))

"7x U 12 fish 14?"

Python:

In : import re

In : Y = "7x U 12 14 cat?"

In : re.sub(r"(d+)s+(cat|dog)", 'fish g<1>', Y)

Out: '7x U 12 14 fish 14?'

4.2.7 String Templates

String templates are useful for stamping out copies of text that is mostly boilerplate. Examples include simple HTML documents (we’ll see an example of this in Section 7.17.2) and input files for other programs when performing a parameter sweep where just one value changes in each file.

The Python strings module from the standard library has a function, Template(), that returns a template object whose entries can be replaced by calling the object’s _substitute() method. Here’s an example:

Python:

In : import string

In : T = string.Template("""alpha = ${Alpha}

...: thickness = ${layer_mm}

...: E = 10.0e6

...: nu = 0.33

...: """)

In : t_new = T.substitute(Alpha=.125, layer_mm=0.11)

In : print(t_new)

alpha = 0.125

thickness = 0.11

E = 10.0e6

nu = 0.33

I’m unaware of a text templating mechanism for MATLAB.

The Jinja2² template engine offers much more power than the Python standard library’s string.Template(). It offers text generation with loops, conditional expressions, template hierarchies with inheritance, macros, filters, Python code execution, and supports include files.

4.3 Python Lists and MATLAB Cell Arrays

A Python list contains a sequence of arbitrary scalar values and/or containers. A list is created with open and close brackets, [ item₁, item₂, … ], so it superficially resembles a MATLAB array. However, unlike a MATLAB array, lists may contain different data types. Therefore, Python lists most closely resemble MATLAB cell arrays.

All Python variables (as well as functions and classes) are objects that have functions, or methods, associated with them. We can see the methods available for lists by using ipython’s interactive help:

Python:

In : a.<TAB>

a.append a.copy a.extend a.insert a.remove a.sort

a.clear a.count a.index a.pop a.reverse

Further help on any of these methods can be found by adding a question mark after the method’s name:

Python:

In : a.append?

Signature: a.append(object, /)

Docstring: Append object to the end of the list.

Type: builtin_function_or_method

For completeness, here are the methods that work with MATLAB cell arrays:

MATLAB:

>> methods cell

Methods for class cell:

cellismemberlegacy ismatrix issortedrows reshape transpose

ctranspose ismember isvector setdiff union

display isrow maxk setxor unique

intersect isscalar mink sort

iscolumn issorted permute strcat

The following sections show how to manipulate Python lists and the MATLAB equivalent with cell arrays.

4.3.1 Initialize an Empty List

An empty cell array of a given size can be allocated in MATLAB with the cell() function. If given only one numeric argument, N, it will return an N x N collection of empty cells—not always the desired outcome. To simply make N empty cells, we’ll need to supply a second dimension of 1.

In Python, we can preallocate a list of None values by multiplying a single item list by the desired count:

MATLAB:	Python:
>> a = cell(1,3) {0×0 double} {0×0 double} {0×0 double}	In : a = [ None] * 3 In : a Out: [None, None, None]

MATLAB:

Python:

>> a = cell(1,3)

{0×0 double} {0×0 double} {0×0 double}

In : a = [ None] * 3

In : a

Out: [None, None, None]

4.3.2 Create a List with Given Values

MATLAB:	Python:
>> a = {1,2.2,'a string'} a = 1×3 cell array {[1]} {[2.2000]} {'a string'}	In : a= [1,2.2,'a string'] In : a Out: [1,2.2,'a string']

MATLAB:

Python:

>> a = {1,2.2,'a string'}

a =

1×3 cell array

{[1]} {[2.2000]}

{'a string'}

In : a= [1,2.2,'a string']

In : a

Out: [1,2.2,'a string']

Additional methods exist in both languages to convert other containers into lists. MATLAB has cell(), mat2cell(), and num2cell(), while in Python one can use the list() function or write a list comprehension (described in Section 4.3.14).

4.3.3 Get the Length of a List

MATLAB:	Python:
>> size(a) 1 3 >> n_items = size(a,2) 3	In : len(a) Out: 3 In : n_items = len(a) In : n_items Out: 3

MATLAB:

Python:

>> size(a)

1 3

>> n_items = size(a,2)

In : len(a)

Out: 3

In : n_items = len(a)

In : n_items

Out: 3

4.3.4 Index a List Item

Python list indexing (as with NumPy arrays as we’ll see later) uses brackets, while MATLAB allows parentheses and braces. A MATLAB cell array indexed with parentheses returns the indexed container (recall that MATLAB insists that even scalar variables are matrices), while braces give the item within the indexed container. The difference between indexing with () and {} is best illustrated with an example:

MATLAB:	Python:
>> a(3) 1×1 cell array {'a string'} >> a{3} 'a string'	In : a[2] Out: 'a string'

MATLAB:

Python:

>> a(3)

1×1 cell array

{'a string'}

>> a{3}

'a string'

In : a[2]

Out: 'a string'

As with string indexing demonstrated in Section 4.2.5.1, Python makes it easy to reference list items from the end of the list by using negative indices; index -1 means “the last item in the list,” index -2 means “the second to last item,” and so on. MATLAB does not allow negative indices, but its end keyword refers to the last item:

MATLAB:	Python:
>> a{ end} 'a string'	In : a[-1] Out: 'a string'

MATLAB:

Python:

>> a{ end}

'a string'

In : a[-1]

Out: 'a string'

Negative indexing has a drawback in that it can mask latent bugs. Say you write code that only accesses list items with zero or positive indices. If the code has a logic error which permits an index to become negative, instead of crashing with an index error like MATLAB, your code will continue to run—and yield bad results.

Attempting to access a positive or negative index that exceeds the size of the list will raise an error:

MATLAB:	Python:
>> a = {1, 2.2, 'a string'}; >> a{4} Index exceeds the number of array elements (3). >> a{-4} Array indices must be positive integers or logical values.	In : a = [1, 2.2, 'a string'] In : a[3] IndexError Traceback ----> 1 a[3] IndexError: list index out of range In : a[-3] Out: 1 In : a[-4] IndexError Traceback ----> 1 a[-4] IndexError: list index out of range

MATLAB:

Python:

>> a = {1, 2.2, 'a string'};

>> a{4}

Index exceeds the number of

array elements (3).

>> a{-4}

Array indices must be positive

integers or logical values.

In : a = [1, 2.2, 'a string']

In : a[3]

IndexError Traceback

----> 1 a[3]

IndexError: list index out of range

In : a[-3]

Out: 1

In : a[-4]

IndexError Traceback

----> 1 a[-4]

IndexError: list index out of range

4.3.5 Extract a Range of Items

Both MATLAB and Python allow one to extract a range of items, either continuously or by steps, using a colon, :, to denote a continuous range of indices or two colons to denote a range with a stride. We’ll need a longer list to demonstrate this:

MATLAB:	Python:
>> a = num2cell(100:106) a = 1×7 cell array Columns 1 through 7 {[100]} {[101]} {[102]} {[103]} {[104]} {[105]} {[106]}	In : a = list(range(100,107)) In : a Out: [100, 101, 102, 103, 104, 105, 106]

MATLAB:

Python:

>> a = num2cell(100:106)

a =

1×7 cell array

Columns 1 through 7

{[100]} {[101]} {[102]}

{[103]} {[104]} {[105]}

{[106]}

In : a = list(range(100,107))

In : a

Out: [100, 101, 102, 103,

104, 105, 106]

Note that in MATLAB, a = {100:106} produces a cell array with a single item (a matrix of seven values), which is not what we’re after:

MATLAB:

>> a = {100:106}

a =

1×1 cell array

{1×7 double}

>> a{1}

100 101 102 103 104

105 106

To emphasize that MATLAB cell arrays and Python lists can hold disparate data types, we’ll change one item to a float and one item to a string:

MATLAB:	Python:
>> a{2} = -0.1; >> a{7} = 'cell' a = 1×7 cell array Columns 1 through 7 {[100]} {[-0.1000]} {[102]} {[103]} {[104]} {[105]} {'cell'}	In : a[1] = -0.1 In : a[6] = 'list' In : a Out: [100, -0.1, 102, 103, 104, 105, 'list']

MATLAB:

Python:

>> a{2} = -0.1;

>> a{7} = 'cell'

a =

1×7 cell array

Columns 1 through 7

{[100]} {[-0.1000]} {[102]}

{[103]} {[104]} {[105]}

{'cell'}

In : a[1] = -0.1

In : a[6] = 'list'

In : a

Out: [100, -0.1, 102, 103,

104, 105, 'list']

Cell arrays and list slices are accessed similarly as the following examples illustrate.

Extract the first three items

Python’s range operator differs a bit from MATLAB’s. In Python, the start index may be omitted if it is 0, and the value for the end index is not part of the returned list. In other words, [:3] returns list items 0, 1, and 2.

Here, we show the MATLAB cell array subscripted with both braces and parentheses.

MATLAB:	Python:
>> a{1:3} 100 -0.1000 102 >> a(1:3) 1×3 cell array {[100]} {[-0.100]} {[102]}	In : a[:3] Out: [100, -0.1, 102]

MATLAB:

Python:

>> a{1:3}

100

-0.1000

102

>> a(1:3)

1×3 cell array

{[100]} {[-0.100]} {[102]}

In : a[:3]

Out: [100, -0.1, 102]

Extract the last four items:

MATLAB:	Python:
>> a{end-3:end} 103 104 105 'cell'	In : a[-4:] Out: [103, 104, 105, 'list']

MATLAB:

Python:

>> a{end-3:end}

103

104

105

'cell'

In : a[-4:]

Out: [103, 104, 105, 'list']

Extract every third item , beginning with the second one:

MATLAB:	Python:
>> a(2:3:end) 1×2 cell array {[-0.100]} {[104]}	In : a[1::3] Out: [-0.1, 104]

MATLAB:

Python:

>> a(2:3:end)

1×2 cell array

{[-0.100]} {[104]}

In : a[1::3]

Out: [-0.1, 104]

4.3.6 Warning—Python Index Ranges Are Not Checked!

Although a single list index raises an IndexError if it exceeds the bounds of the list, index ranges have no such checks.

In Python, out-of-bounds index ranges merely return an empty list; they will not raise an error.

Consider this example:

In : a = [1, 2.2, 'a string']

Out: a[27636]

IndexError Traceback

----> 1 a[27636]

IndexError: list index out of range

Not surprising: indexing item 27,636 in a list having only three items gives an error. Here’s an unpleasant surprise though:

In : a[27636:-524385732]

Out: []

There’s no error! What is going on? Python will raise an error when a list is indexed by a single value outside the index bounds, but silently accepts index ranges which are out of bounds. Unchecked index ranges offer rich opportunities for code errors to pass undetected. They place the responsibility for checking start and end index values on the developer.

Allowing range bound violations does have convenient applications though. A simple example is truncating strings to a given length. In MATLAB, one must make sure the truncation length is less than or equal to the string length. Python doesn’t care if the string is shorter than the truncation length:

MATLAB:	Python:
>> S = "abcdefghijklm"; >> n_chop = 6; >> extractBetween(S,1,n_chop) "abcdef" >> S = "abc"; >> extractBetween(S,1,n_chop) Error using extractBetween Numeric value exceeds the number of characters in element 1.	In : S = "abcdefghijklm" In : n_chop = 6 In : S[:n_chop] Out: 'abcdef' In : S = "abc" In : S[:n_chop] Out: 'abc'

MATLAB:

Python:

>> S = "abcdefghijklm";

>> n_chop = 6;

>> extractBetween(S,1,n_chop)

"abcdef"

>> S = "abc";

>> extractBetween(S,1,n_chop)

Error using extractBetween

Numeric value exceeds the number

of characters in element 1.

In : S = "abcdefghijklm"

In : n_chop = 6

In : S[:n_chop]

Out: 'abcdef'

In : S = "abc"

In : S[:n_chop]

Out: 'abc'

Another example is splitting a collection into evenly sized sets and not having to bother with leftovers on uneven splits. Here, we group the numbers 1 through 20 into three evenly sized sets:

Python:

n_items = 20

n_groups = 3

set_size = int(np.ceil(n_items/n_groups))

L = list(range(1,n_items+1))

for i in range(n_groups):

print(L[i*set_size:(i+1)*set_size])

Obviously , 20 is not evenly divisible by 3, but we don’t have to bother with that detail; the output sets have the desired counts of 7, 7, and 6 members:

Python:

[ 1, 2, 3, 4, 5, 6, 7]

[ 8, 9, 10, 11, 12, 13, 14]

[15, 16, 17, 18, 19, 20]

Had array bounds been checked, the last iteration would have raised an error since the print statement attempts to access a nonexistent 21st element. Instead, Python just returns nothing for the missing item.

The equivalent output can be produced with MATLAB code that caps the ending index at each iteration:

MATLAB:

n_items = 20;

n_groups = 3;

set_size = ceil(n_items/n_groups);

L = 1:n_items;

for i = 1:n_groups

end_index = min(n_items, i*set_size); % prevent array bounds violation

L((i-1)*set_size+1:end_index)

end

4.3.7 Append an Item

Items can be added to MATLAB cell arrays simply by introducing a new index. The new index can be any integer value; it need not be an increment of the last index in the cell array. If there is a gap of indices, the skipped terms are created and populated with an empty matrix.

In Python, items are added to a list via the list’s .append() or .extend() methods, where .append() adds on a single item while .extend() can be used to join a second list to the first. New list entries cannot be added with subscripts.

MATLAB:	Python:
>> a{8} = 3.14 % equivalent to >> a(8) = {3.14} a = 1×8 cell array Columns 1 through 8 {[100]} {[101]} {[102]} {[103]} {[104]} {[105]} {[106]} {[3.1400]}	In : a.append(3.14) In : a Out: [100, -0.1, 102, 103, 104, 105, 'list', 3.14]

MATLAB:

Python:

>> a{8} = 3.14

% equivalent to

>> a(8) = {3.14}

a =

1×8 cell array

Columns 1 through 8

{[100]} {[101]} {[102]}

{[103]} {[104]} {[105]}

{[106]} {[3.1400]}

In : a.append(3.14)

In : a

Out: [100, -0.1, 102, 103,

104, 105, 'list', 3.14]

The following example adds an entry directly into the MATLAB cell array’s tenth position, even though it currently has only eight items. The ninth position is automatically populated by an empty matrix. The same manipulation cannot be done in Python. To put an entry into the tenth position, we first have to fill the ninth position with something. Here, we use None, the Python expression for null:

MATLAB:	Python:
>> a{10} = 2.71 a = 1×10 cell array Columns 1 through 10 {[100]} {[101]} {[102]} {[103]} {[104]} {[105]} {[106]} {[3.1400]} {0x0 double} {[2.7100]}	In : a.append(None) In : a Out: [100, -0.1, 102, 103, 104, 105, 'list', 3.14, None] In : a.append(2.71) In : a Out: [100, -0.1, 102, 103, 104, 105, 'list', 3.14, None, 2.71]

MATLAB:

Python:

>> a{10} = 2.71

a =

1×10 cell array

Columns 1 through 10

{[100]} {[101]} {[102]}

{[103]} {[104]} {[105]}

{[106]} {[3.1400]}

{0x0 double} {[2.7100]}

In : a.append(None)

In : a

Out: [100, -0.1, 102, 103,

104, 105, 'list', 3.14,

None]

In : a.append(2.71)

In : a

Out: [100, -0.1, 102, 103,

104, 105, 'list', 3.14,

None, 2.71]

4.3.8 Append Another List

The Python .extend() method for lists appends the contents of one list to another. MATLAB has a horzcat() function that achieves the same thing:

MATLAB:	Python:
>> a = {1, 'two'}; >> b = {3, 4.4, 5}; >> c = horzcat(a,b) c = 1x5 cell array Columns 1 through 5 {[1]} {'two'} {[3]} {[4.4000]} {[5]}	In : a = [1, 'two'] In : b = [3, 4.4, 5] In : a.extend(b) In : a Out: [1, 'two', 3, 4.4, 5]

MATLAB:

Python:

>> a = {1, 'two'};

>> b = {3, 4.4, 5};

>> c = horzcat(a,b)

c =

1x5 cell array

Columns 1 through 5

{[1]} {'two'} {[3]}

{[4.4000]} {[5]}

In : a = [1, 'two']

In : b = [3, 4.4, 5]

In : a.extend(b)

In : a

Out: [1, 'two', 3, 4.4, 5]

Alternatively, Python lists can be extended with the + operator :

In : a = [1, 'two']

In : b = [3, 4.4, 5]

In : a + b

Out: [1, 'two', 3, 4.4, 5]

and list entries can be replicated with the * operator:

In : a = [1, 'two']

In : a*3

Out: [1, 'two', 1, 'two', 1, 'two']

4.3.9 Preallocate an Empty List

Occasionally, it is convenient to create a list not by appending items but by inserting items nonsequentially into a predefined (but empty) list of known size. In MATLAB, one can achieve this by calling cell() with the desired dimensions. In Python, the desired initial value is put in a single item list, and then this list is multiplied by the desired size. The Python example shows two such initializations, once with None’s and once with empty lists:

MATLAB:	Python:
>> a = cell(4,1) a = { [1,1] = [](0x0) [2,1] = [](0x0) [3,1] = [](0x0) [4,1] = [](0x0) } >> a{3} = -7.2 a = { [1,1] = [](0x0) [2,1] = [](0x0) [3,1] = -7.2000 [4,1] = [](0x0) }	In : a = 4[None] In : a Out: [None, None, None, None] In : a[2] = -7.2 In : a Out: [None, None, -7.2, None] In : a = 4[[]] In : a Out: [[], [], [], []] In : a[2] = -7.2 In : a Out: [[], [], -7.2, []]

MATLAB:

Python:

>> a = cell(4,1)

a =

{

[1,1] = [](0x0)

[2,1] = [](0x0)

[3,1] = [](0x0)

[4,1] = [](0x0)

}

>> a{3} = -7.2

a =

{

[1,1] = [](0x0)

[2,1] = [](0x0)

[3,1] = -7.2000

[4,1] = [](0x0)

}

In : a = 4*[None]

In : a

Out: [None, None, None, None]

In : a[2] = -7.2

In : a

Out: [None, None, -7.2, None]

In : a = 4*[[]]

In : a

Out: [[], [], [], []]

In : a[2] = -7.2

In : a

Out: [[], [], -7.2, []]

Note that two dimensions were passed to MATLAB’s cell(). If we were to call cell(4) instead of cell(4,1), the result would be a 4 × 4 cell array.

4.3.10 Insert to the Beginning (or Any Other Position) of a List

Python lists have an .insert() method that allows a new item to be inserted at any desired index. The method takes two arguments: the index which the object should occupy after the insertion and the object to be inserted. All other existing list items are shifted to the right by one position. Inserting an item to the beginning of a list is then done with .insert(0, item). Note, however, that adding list items anywhere other than at the end becomes expensive as the list becomes large.

MATLAB:	Python:
>> a {[100]} {[-0.100]} {[102]} {[103]} {[104]} {[105]} {'list'} >> a = horzcat('new',a) {'new'} {[100]} {[-0.100]} {[102]} {[103]} {[104]} {[105]} {'list'}	In : a Out: [100, -0.1, 102, 103, 104, 105, 'list'] In : a.insert(0, 'new') In : a Out: ['new', 100, -0.1, 102, 103, 104, 105, 'list']

MATLAB:

Python:

>> a

{[100]} {[-0.100]} {[102]} {[103]}

{[104]} {[105]} {'list'}

>> a = horzcat('new',a)

{'new'} {[100]} {[-0.100]} {[102]}

{[103]} {[104]} {[105]} {'list'}

In : a

Out: [100, -0.1, 102, 103,

104, 105, 'list']

In : a.insert(0, 'new')

In : a

Out: ['new', 100, -0.1, 102,

103, 104, 105, 'list']

4.3.11 Indexing Nested Containers

Entries within nested cell arrays are indexed in MATLAB with both braces and parentheses; the braces index into the cell array, and parentheses index into the item within the cell.

Python’s indexing is more straightforward as brackets are used ubiquitously:

MATLAB:	Python:
>> a = {1, {'inner', 'cell'}, -3.3} a = 1x3 cell array {[1]} {1×2 cell} {[-3.3000]} >> a{2} 1x2 cell array {'inner'} {'cell'} >> a{2}(1) 1x1 cell array {'inner'}	In : a = [1, ['inner', 'list'], -3.3] In : a Out: [1, ['inner', 'list'], -3.3] In : a[1] Out: ['inner', 'list'] In : a[1][0] Out: 'inner'

MATLAB:

Python:

>> a = {1, {'inner', 'cell'},

-3.3}

a =

1x3 cell array

{[1]} {1×2 cell} {[-3.3000]}

>> a{2}

1x2 cell array

{'inner'} {'cell'}

>> a{2}(1)

1x1 cell array

{'inner'}

In : a = [1, ['inner', 'list'],

-3.3]

In : a

Out: [1, ['inner', 'list'], -3.3]

In : a[1]

Out: ['inner', 'list']

In : a[1][0]

Out: 'inner'

4.3.12 Membership Test: Does an Item Exist in a List?

We saw at the beginning of this section that the ismember() function works for MATLAB cell arrays. I’ve not had luck using ismember() with mixed-type data in MATLAB 2020b though. (If all entries are numeric, the cell array can be converted to a matrix after which the find() function can be used.) Instead, I use this small function to check if an item exists in a cell array:

MATLAB:

function [found_it] = cell_has(C, value)

found_it = 0;

for i = 1:length(C)

if C{i} == value

found_it = 1;

break

end

Python, in contrast, makes list, set, and dictionary key membership tests easy with the in operator:

MATLAB:	Python:
>> a = {'hi', 102, 3.3}; >> cell_has(a, 102) 1 >> cell_has(a, 27) 0	In : a = ['hi', 102, 3.3] In : 102 in a Out: True In : 27 in a Out: False

MATLAB:

Python:

>> a = {'hi', 102, 3.3};

>> cell_has(a, 102)

>> cell_has(a, 27)

In : a = ['hi', 102, 3.3]

In : 102 in a

Out: True

In : 27 in a

Out: False

Returning briefly to ismember() in MATLAB, the six attempts at checking if 102 is in a all yield the same error:

MATLAB:

>> a = {'hi', 102, 3.3};

>> ismember(a, 102)

>> ismember(a, '102')

>> ismember(a, {102})

>> ismember(a, {'102'})

>> ismember(a, {[102]})

>> ismember(a, {['102']})

Error using cell/ismember

Input A of class cell and input B of class cell must be

cell arrays of character vectors, unless one is a

character vector .

4.3.13 Find the Index of an Item

MATLAB employs the find() function to locate a value in a cell array of numeric values. Python’s equivalent is the .index() method—which has the additional benefit of working with mixed data types, not just numeric values. Here, we identify the index of the value 102:

MATLAB:	Python:
>> a = num2cell(100:106) a = 1×7 cell array Columns 1 through 7 {[100]} {[101]} {[102]} {[103]} {[104]} {[105]} {[106]} >> find([a{:}] == 102) 3	In : a = list(range(100,107)) In : a Out: [100, 101, 102, 103, 104, 105, 106] In : a.index(102) Out: 2

MATLAB:

Python:

>> a = num2cell(100:106)

a =

1×7 cell array

Columns 1 through 7

{[100]} {[101]} {[102]}

{[103]} {[104]} {[105]}

{[106]}

>> find([a{:}] == 102)

In : a = list(range(100,107))

In : a

Out: [100, 101, 102, 103,

104, 105, 106]

In : a.index(102)

Out: 2

If we look for an item that doesn’t exist, Python raises the ValueError exception :

MATLAB:	Python:
>> find([a{:}] == 27)	In : a.index(27) ValueError: 27 is not in list

MATLAB:

Python:

>> find([a{:}] == 27)

In : a.index(27)

ValueError: 27 is not in list

Curiously, MATLAB’s find() fails if the cell array has disparate data types and one attempts to find a nonnumeric term. Python has no such issue:

MATLAB:	Python:
>> a{7} = 'string'; >> a a = 1×7 cell array Columns 1 through 7 {[100]} {[101]} {[102]} {[103]} {[104]} {[105]} {'string'} >> find([a{:}] == 102) 3 >> find([a{:}] == 'string') Matrix dimensions must agree.	In : a[6] = 'string' In : a Out: [100, 101, 102, 103, 104, 105, 'string'] In : a.index(102) Out: 2 In : a.index('string') Out: 6

MATLAB:

Python:

>> a{7} = 'string';

>> a

a =

1×7 cell array

Columns 1 through 7

{[100]} {[101]} {[102]}

{[103]} {[104]} {[105]}

{'string'}

>> find([a{:}] == 102)

>> find([a{:}] == 'string')

Matrix dimensions must agree.

In : a[6] = 'string'

In : a

Out: [100, 101, 102, 103,

104, 105, 'string']

In : a.index(102)

Out: 2

In : a.index('string')

Out: 6

4.3.14 Apply an Operation to All Items (List Comprehension)

Python list comprehensions closely resemble MATLAB’s cellfun() and arrayfun() functions to apply an operation to each element in the array. This example returns the cube of each entry:

MATLAB:	Python:
>> a = {0.3, 0.4, 0.5}; >> cellfun(@(x)(x.ˆ3),a) 0.0270 0.0640 0.1250	In : a = [0.3, 0.4, 0.5] In : [x**3 for x in a] Out: [0.027, 0.064, 0.125]

MATLAB:

Python:

>> a = {0.3, 0.4, 0.5};

>> cellfun(@(x)(x.ˆ3),a)

0.0270 0.0640 0.1250

In : a = [0.3, 0.4, 0.5]

In : [x**3 for x in a]

Out: [0.027, 0.064, 0.125]

The generic notation is

LHS = [ operator(x) for x in List ]

where x is an arbitrary variable name, and List is the name of the list you want to operate on. The left-hand side variable LHS will contain a new list containing the result of the operator applied to terms of the original list. Expanded to a for loop, the preceding comprehension would look like this:

LHS = []

for x in List:

LHS.append( operator(x) )

The list comprehension runs more quickly than the for loop though. This example creates a string showing each element of the list prefixed by “0x”:

MATLAB:	Python:
>> a = {100, 101, 102,... 103, 'string'}; >> cellfun(@(y) "0x" + ... string(y), a) 1×5 string array "0x100" "0x101" "0x102" "0x103" "0xstring"	In : a = [100, 101, 102, 103, 'string'] In : [f'0x{str(y)}' for y in a] Out: ['0x100', '0x101', '0x102', '0x103', '0xstring']

MATLAB:

Python:

>> a = {100, 101, 102,...

103, 'string'};

>> cellfun(@(y) "0x" + ...

string(y), a)

1×5 string array

"0x100" "0x101" "0x102"

"0x103" "0xstring"

In : a = [100, 101, 102,

103, 'string']

In : [f'0x{str(y)}' for y in a]

Out: ['0x100', '0x101', '0x102',

'0x103', '0xstring']

4.3.15 Select a Subset of Items Based on a Condition

List comprehensions can be paired with Boolean expressions to create filters. The notation is

[ operator(x) for x in List if condition(x) ]

Here, we extract entries whose first letter is uppercase. Python strings, like MATLAB strings, may be indexed numerically to access individual characters or substrings. In this way, we can get the first character with x[0] and test if it is uppercase by applying the string method .isupper() to it. Thus, the following x[0].isupper() returns True if the first character of the iterator string, x, is uppercase and False otherwise. There are several possible solutions in MATLAB, but each requires multiple steps. The following solution creates an index array that identifies terms which satisfy the condition:

MATLAB:	Python:
>> a = {'Select','a','Subset',... 'of','Items','Based',... 'on','a','Condition'} >> i = isstrprop(cellfun(@(x)... x(1), a), 'upper'); >> a(i) {'Select'} {'Subset'} {'Items'} {'Based'} {'Condition'}	In : a = ['Select', 'a', 'Subset', 'of', 'Items', 'Based', 'on', 'a', 'Condition'] In : [x for x in a if x[0].isupper()] Out: ['Select', 'Subset', 'Items', 'Based', 'Condition']

MATLAB:

Python:

>> a = {'Select','a','Subset',...

'of','Items','Based',...

'on','a','Condition'}

>> i = isstrprop(cellfun(@(x)...

x(1), a), 'upper');

>> a(i)

{'Select'} {'Subset'} {'Items'}

{'Based'} {'Condition'}

In : a = ['Select', 'a', 'Subset',

'of', 'Items', 'Based',

'on', 'a', 'Condition']

In : [x for x in a if x[0].isupper()]

Out: ['Select', 'Subset', 'Items',

'Based', 'Condition']

4.3.16 How Many Times Does an Item Occur?

Python lists have a .count() method which returns the number of times the given argument appears.

MATLAB:	Python:
>> a = {'To',2,'To','To','u'}; >> sum(cellfun(@(x) string(x)... == 'To', a)) 3 >> sum(cellfun(@(x) string(x)... == 'From', a)) 0	In : a = ['To',2,'To','To','u'] In : a.count('To') Out: 3 In : a.count('From') Out: 0

MATLAB:

Python:

>> a = {'To',2,'To','To','u'};

>> sum(cellfun(@(x) string(x)...

== 'To', a))

>> sum(cellfun(@(x) string(x)...

== 'From', a))

In : a = ['To',2,'To','To','u']

In : a.count('To')

Out: 3

In : a.count('From')

Out: 0

4.3.17 Remove the First or Last (or Any Intermediate) List Item

Python lists support a .pop() method which returns the last item in the array and removes that term from the list. If .pop() is given an index n, it returns—and removes—the nth item; .pop(0) therefore removes the first item in the list. The same result can be achieved in MATLAB with index slices.

MATLAB:	Python:
>> a = {21, 22, 23, 24, 25}; >> b = a(end); >> a = a(1:end-1) {[21]} {[22]} {[23]} {[24]} >> b {[25]} >> a = {21, 22, 23, 24, 25}; >> a(1) {[21]} >> a = a(2:end) {[22]} {[23]} {[24]} {[25]}	In : a = [21, 22, 23, 24, 25] In : b = a.pop() In : a Out: [21, 22, 23, 24] In : b Out: 25 In : a = [21, 22, 23, 24, 25] In : a.pop(0) Out: 21 In : a Out: [22, 23, 24, 25]

MATLAB:

Python:

>> a = {21, 22, 23, 24, 25};

>> b = a(end);

>> a = a(1:end-1)

{[21]} {[22]} {[23]} {[24]}

>> b

{[25]}

>> a = {21, 22, 23, 24, 25};

>> a(1)

{[21]}

>> a = a(2:end)

{[22]} {[23]} {[24]} {[25]}

In : a = [21, 22, 23, 24, 25]

In : b = a.pop()

In : a

Out: [21, 22, 23, 24]

In : b

Out: 25

In : a = [21, 22, 23, 24, 25]

In : a.pop(0)

Out: 21

In : a

Out: [22, 23, 24, 25]

The slicing method for MATLAB gets clumsy when removing an item from the middle though:

MATLAB:	Python:
>> a = {21,22,23,24,25}; >> i = 3; >> a(1) {[23]} >> a = horzcat(a(1:i-1),a(i+1:end)) {[21]} {[22]} {[24]} {[25]}	In : a = [21,22,23,24,25] In : i = 2 In : a.pop(i) Out: 23 In : a Out: [21, 22, 24, 25]

MATLAB:

Python:

>> a = {21,22,23,24,25};

>> i = 3;

>> a(1)

{[23]}

>> a = horzcat(a(1:i-1),a(i+1:end))

{[21]} {[22]} {[24]} {[25]}

In : a = [21,22,23,24,25]

In : i = 2

In : a.pop(i)

Out: 23

In : a

Out: [21, 22, 24, 25]

4.3.18 Remove an Item by Value

If one knows the index of an item to remove from a list, the .pop(index) method described earlier works nicely. But what if you only know the value of the item to remove? In this case, the .remove() method is useful. Note that only the first occurrence of the matched value is removed. A ValueError exception is raised if the requested value doesn’t appear.

The MATLAB solution uses an index array that stores locations of matching values. The negation of the index array, i, corresponds to locations of nonmatching values. Unlike Python’s .remove() method, MATLAB doesn’t care if a value is not found. In this case, it merely returns an index array with no hits.

MATLAB:	Python:
>> a = {22,21,'a',22,21}; >> i = cellfun(@(x) x == 22, a); >> a = a(˜i) {[21]} {'a'} {[21]} >> i = cellfun(@(x) x == -4, a); >> a = a(˜i) {[21]} {'a'} {[21]}	In : a = [22,21,'a',22,21] In : a.remove(22) In : a Out: [21, 'a', 22, 21] In : a.remove(22) Out: [21, 'a', 21] In : a.remove(-4) ----------------------------- ValueError ----> 1 a.remove(-4) ValueError: list.remove(x): x not in list

MATLAB:

Python:

>> a = {22,21,'a',22,21};

>> i = cellfun(@(x) x == 22, a);

>> a = a(˜i)

{[21]} {'a'} {[21]}

>> i = cellfun(@(x) x == -4, a);

>> a = a(˜i)

{[21]} {'a'} {[21]}

In : a = [22,21,'a',22,21]

In : a.remove(22)

In : a

Out: [21, 'a', 22, 21]

In : a.remove(22)

Out: [21, 'a', 21]

In : a.remove(-4)

-----------------------------

ValueError

----> 1 a.remove(-4)

ValueError: list.remove(x):

x not in list

4.3.19 Merging Multiple Lists

Related data items in separate lists must sometimes be grouped into individual pairwise (or, more generally, n-wise) items. For example, say you have a list of letters, ‘A’, ‘B’, ‘C’, …, and a corresponding list of those letters’ ASCII values, 65, 66, 67, …, and you want to merge these two lists into a single new list of letter and ASCII value pairs, [ (‘A’, 65), (‘B, 66), … ]. MATLAB allows one to create a new cell array by stacking existing cells, while Python has a function, zip(), which combines the lists. (Imagine a zipper joining two sections of fabric.)

zip() returns a generator to the combined list, rather than the combined list itself. Generators (Section 3.9) are great for iterating over, but they do not permit random indexing, as demonstrated with the following TypeError. If you need the fully populated list rather than a generator, invoke list() on the generator:

MATLAB :	Python:
>> Letter = {'A','B','C'} Letter = { [1,1] = A [1,2] = B [1,3] = C } >> ASCII = {65,66,67} ASCII = { [1,1] = 65 [1,2] = 66 [1,3] = 67 } >> both={Letter;ASCII} both = { [1,1] = { [1,1] = A [1,2] = B [1,3] = C } [2,1] = { [1,1] = 65 [1,2] = 66 [1,3] = 67 } } >> both{2}{3} ans = 67	In : Letter = ['A','B','C'] In : Letter Out: ['A', 'B', 'C'] In : ASCII = [65, 66, 67] In : ASCII Out: [65, 66, 67] In : both = zip(Letter,ASCII) In : both Out: <zip at 0x7fc4500c4050> In : list(both) Out: [('A', 65), ('B', 66), ('C', 67)] In : both[2][1] ------------------------------ TypeError Traceback ----> 1 both[2][1] TypeError: 'zip' object is not subscriptable In : both = list(both) In : both[2][1] Out: 67

MATLAB :

Python:

>> Letter = {'A','B','C'}

Letter =

{

[1,1] = A

[1,2] = B

[1,3] = C

}

>> ASCII = {65,66,67}

ASCII =

{

[1,1] = 65

[1,2] = 66

[1,3] = 67

}

>> both={Letter;ASCII}

both =

{

[1,1] =

{

[1,1] = A

[1,2] = B

[1,3] = C

}

[2,1] =

{

[1,1] = 65

[1,2] = 66

[1,3] = 67

}

>> both{2}{3}

ans = 67

In : Letter = ['A','B','C']

In : Letter

Out: ['A', 'B', 'C']

In : ASCII = [65, 66, 67]

In : ASCII

Out: [65, 66, 67]

In : both = zip(Letter,ASCII)

In : both

Out: <zip at 0x7fc4500c4050>

In : list(both)

Out: [('A', 65), ('B', 66),

('C', 67)]

In : both[2][1]

------------------------------

TypeError Traceback

----> 1 both[2][1]

TypeError: 'zip' object is

not subscriptable

In : both = list(both)

In : both[2][1]

Out: 67

zip() is frequently seen in for loops that need to step through multiple lists in lockstep:

Python:

In : X = [0.21, 0.96, 0.26, 0.34, 0.90, 0.82]

In : Y = [-1.36, -1.88, -1.20, -1.10, -1.16, -1.27]

In : Z = [4.89, 4.08, 4.82, 4.62, 4.43, 4.93]

In : for x,y,z in zip(X,Y,Z):

...: print(f'{x:6.3f} {y:6.3f} {z:6.3f}')

0.210 -1.360 4.890

0.960 -1.880 4.080

0.260 -1.200 4.820

0.340 -1.100 4.620

0.900 -1.160 4.430

0.820 -1.270 4.930

4.3.20 Unmerging Combined Lists

The previous section showed how to combine like-sized lists. Surprisingly, the opposite operation of unmerging a combined list into multiple individual lists is also done with zip(). The difference is that to unmerge, the argument to zip() is prefixed by an asterisk. In MATLAB, one must use slice operations to extract out subcell arrays one at a time. Using the same variable both from the previous section

MATLAB:	Python:
>> a = both{1,:} a = { [1,1] = A [1,2] = B [1,3] = C } >> b = both{2,:} b = { [1,1] = 65 [1,2] = 66 [1,3] = 67 }	In : a, b = zip(*both) In : a Out: ['A', 'B', 'C'] In : b Out: [65, 66, 67]

MATLAB:

Python:

>> a = both{1,:}

a =

{

[1,1] = A

[1,2] = B

[1,3] = C

}

>> b = both{2,:}

b =

{

[1,1] = 65

[1,2] = 66

[1,3] = 67

}

In : a, b = zip(*both)

In : a

Out: ['A', 'B', 'C']

In : b

Out: [65, 66, 67]

Prefixing a list or numeric array with an asterisk, as with *both earlier, means “expand the terms.” In other words, if x = [9, 'b'], then x is a single item, a list. *x, however, means two separate terms, 9 and 'b'. The asterisk can be thought of as removing the outer container.

4.3.21 Sort a List

One can obtain a sorted list with the sorted() function, or one can sort a list in-place with the list’s .sort() method. Sorting only makes sense for like types, so the MATLAB equivalent is straightforward as we would convert the cell array to a matrix and sort the matrix items:

MATLAB:	Python:
a = {31, -127, 28, 45} {[31]} {[-127]} {[28]} {[45]} >> sort(cell2mat(a)) -127 28 31 45	In : a = [31, -127, 28, 45] In : sorted(a) Out: [-127, 28, 31, 45] In : a Out: [31, -127, 28, 45] In : a.sort() In : a Out: [-127, 28, 31, 45]

MATLAB:

Python:

a = {31, -127, 28, 45}

{[31]} {[-127]} {[28]} {[45]}

>> sort(cell2mat(a))

-127 28 31 45

In : a = [31, -127, 28, 45]

In : sorted(a)

Out: [-127, 28, 31, 45]

In : a

Out: [31, -127, 28, 45]

In : a.sort()

In : a

Out: [-127, 28, 31, 45]

Python’s sorted() and .sort() both take two optional arguments: key, which allows one to customize the sort operation, and reverse , a Boolean which can reverse the sense of the sort. Here, we sort on the absolute value of each item, then also reverse the sort:

In : sorted(a, key=lambda x : abs(x))

Out: [28, 31, 45, -127]

In : sorted(a, reverse=True)

Out: [45, 31, 28, -127]

In : sorted(a, key=lambda x : abs(x), reverse=True)

Out: [-127, 45, 31, 28]

4.3.22 Reverse a List

Finally, the sequence of a Python list can be flipped with either the reversed() function or with the list’s .reverse() method ; using the .reverse() method alters the list in-place. MATLAB can reverse terms of vector or cell array x with fliplr(x) or flip(x,2).

Unlike sorted(), reversed() returns an iterator—a function which returns one value at a time—so to see the actual reversed items in the REPL, we’ll also need to invoke list():

MATLAB:	Python:
>> a {[31]} {[-127]} {[28]} {[45]} >> fliplr(a) {[45]} {[28]} {[-127]} {[31]}	In : a Out: [31, -127, 28, 45] In : reversed(a) Out: <list_reverseiterator> In : list(reversed(a)) Out: [45, 28, -127, 31] In : a Out: [31, -127, 28, 45] In : a.reverse() In : a Out: [45, 28, -127, 31]

MATLAB:

Python:

>> a

{[31]} {[-127]} {[28]} {[45]}

>> fliplr(a)

{[45]} {[28]} {[-127]} {[31]}

In : a

Out: [31, -127, 28, 45]

In : reversed(a)

Out: <list_reverseiterator>

In : list(reversed(a))

Out: [45, 28, -127, 31]

In : a

Out: [31, -127, 28, 45]

In : a.reverse()

In : a

Out: [45, 28, -127, 31]

4.4 Python Tuples

Python tuples closely resemble Python lists—both can contain a collection of items and can be indexed numerically. The primary difference is that a tuple of scalar variables is unchangeable³ after it has been created; think of a tuple as a constant with multiple values. This immutable property gives tuples a critical advantage over lists and sets as it lets tuples act as keys to dictionaries. The use of tuples as dictionary keys is explored in Section 4.6.5.

Not all tuples are “hashable” (meaning they can be dictionary keys), though. Tuples made with variables that are lists, dictionaries, or NumPy arrays can change since the tuple only stores references to these variables; the values in the underlying list/dict/array can still change. A tuple is hashable only if all its member items are hashable—and that rules out tuples that contain references to containers whose contents may change.

MATLAB has no comparable “frozen collection” data container.

Tuples are created by assigning a variable to comma-separated items or by calling the tuple() function with an iterable. Even a single item followed by a comma qualifies as a tuple:

Python:

In : S = 3,

In : type(S)

Out: tuple

In : len(S)

Out: 1

In : T = 3, -8.5, 'cat'

In : type(T)

Out: tuple

In : len(T)

Out: 3

In : T[1]

Out: -8.5

In : U = tuple(range(10,14))

In : U

Out: (10, 11, 12, 13)

Note

Stray commas create tuples! This can lead to mysterious errors far downstream from where the tuple was mistakenly created. As an example, say you write a function that computes a numeric value but the function ends with return X, instead of the intended return X. Later, another function scales this returned value by 4. If X were 1.1, the first function returns the tuple (1.1,). The second function multiplies this by 4, producing (1.1, 1.1, 1.1, 1.1) instead of 4.4. The bad value continues to propagate until an illegal operation, division, for instance, is attempted with the tuple.

Tuples are often seen wrapped in parentheses. This is in fact mandatory when passing a tuple as an argument to a function or assigning multiple tuples on one line:

Python:

In : T, V = (3, -8.5, 'cat'), ('grey', 'dog')

In : T

Out: (3, -8.5, 'cat')

In : V

Out: ('grey', 'dog')

Many NumPy array creation functions—for example, np.ones(), shown earlier in Section 4.1 where it is invoked as np.ones((2,))—expect the first argument to be a tuple defining the array’s dimensions. The double set of parentheses often puzzles new Python programmers. The parentheses are needed to separate the dimension, which is a single variable, from subsequent arguments:

Python:

In : import numpy as np

In : np.ones((3,5), np.uint16)

Out:

array([[1, 1, 1, 1, 1],

[1, 1, 1, 1, 1],

[1, 1, 1, 1, 1]], dtype=uint16)

4.5 Python Sets and MATLAB Set Operations

MATLAB and Python can both perform set operations—unions, intersections, and so on—but only Python has a data container specifically for storing sets. Sets are created with braces or by calling the set() function on an iterable:

Python:

In : x = { 'CA', 'IL' }

In : type(x)

Out: set

In : y = set([44, 55, 'sixty'])

In : type(y)

Out: set

Set members are unique. Calling set() on a list with duplicate elements and then converting the set back to a list is a common way to remove the duplicates. MATLAB’s unique() function behaves similarly:

MATLAB:	Python:
>> Fib = [ 0 1 1 2 3 5]; >> unique(Fib) 0 1 2 3 5	In : Fib = [ 0, 1, 1, 2, 3, 5] In : set(Fib) Out: {0, 1, 2, 3, 5}

MATLAB:

Python:

>> Fib = [ 0 1 1 2 3 5];

>> unique(Fib)

0 1 2 3 5

In : Fib = [ 0, 1, 1, 2, 3, 5]

In : set(Fib)

Out: {0, 1, 2, 3, 5}

Set members can be iterated over, but cannot be indexed numerically. Cast the set to a list if you need to index terms. Beware, though, that sets do not maintain sequence; iteration over a set and casting a set to a list puts the items in any order.

Python:

In : a = {54, 43, 32, 23}

In : a[1]

TypeError: 'set' object is not subscriptable

In : for x in a:

...: print(x)

In : L = list(a)

In : L[2]

Out: 54

Membership tests look different in MATLAB and Python. MATLAB’s ismember() function takes two arrays⁴ as inputs and returns an array of zeros and ones equal to the size of the first array indicating whether or not the corresponding term appears in the second array. An additional call to all() would be needed to check that every item of the first variable exists in the second. Python uses the in operator to check for the presence of individual members. (Although in can also be used to test for membership in a list, performance there has complexity O(N); with sets it is only O(1).) For group membership checks, one can call the .issuperset() method :

MATLAB:	Python:
>> a = [54 43 32 23]; >> ismember(43, a) 1 >> ismember(44, a) 0 >> all(ismember([32, 43], a)) 1	In : a = {54, 43, 32, 23} In : 43 in a Out: True In : 44 in a Out: False In : a.issuperset({32, 43}) Out: True

MATLAB:

Python:

>> a = [54 43 32 23];

>> ismember(43, a)

>> ismember(44, a)

>> all(ismember([32, 43], a))

In : a = {54, 43, 32, 23}

In : 43 in a

Out: True

In : 44 in a

Out: False

In : a.issuperset({32, 43})

Out: True

Table 4-2 summarizes set operations in MATLAB and Python.

Table 4-2

Set operations

Operation	MATLAB	Python	Explanation
Union	union(A,B)	A \| B	All members of A and B
Intersection	intersect(A,B)	A & B	Members that are in both A and B
Difference	setdiff(A,B)	A – B	Members of A after members of B have been removed from A
Symmetric difference	setxor(A,B)	A ˆ B	Members which are only in A or only in B
Subset test	all(ismember(A,B))	A.issubset(B)	True if all members of A are in B
Superset test	all(ismember(B,A))	A.issuperset(B)	True if all members of B are in A
Disjointed test	˜any(ismember(A,B))	A.isdisjoint(B)	True if A and B have no members in common

4.6 Python Dictionaries and MATLAB Maps

Dictionaries (also known as associative arrays or hashes in Perl; hashes or maps in JavaScript; and maps in C++, Java, and MATLAB) allow one to create a relationship between two datasets known as keys and values. Notationally, dictionaries look like lists that can be indexed by arbitrary scalars—strings, for example—instead of just integers. In addition to the convenience they provide developers, dictionaries are also performant. Both inserting new key-value pairs into and retrieving values from dictionaries are O(1) operations on average.

Oddly, despite its power, MATLAB programmers rarely use its Map data container.

Dictionaries are best explained by example. Say we need to look up a country’s capital city. We could store the country-to-capital city relationship in a dictionary like this:

Python:

capital = {} # define capital as an empty dictionary

capital['USA'] = 'Washington D.C.'

capital['Germany'] = 'Berlin'

capital['Japan'] = 'Tokyo'

capital['France'] = 'Paris'

Retrieving a country’s capital is then just a matter of using the country’s name as the subscript to the dictionary:

Python:

In : country = 'Japan'

In : print(f'The capital of {country} is {capital[country]}.')

The capital of Japan is Tokyo.

Dictionaries beat lists for storing relationships not only because of the key/value binding but because they permit much faster data lookup. Imagine storing the country/city data in a list and having to return a city given a country. Even if the list were sorted by country name, the fastest search would be O(log₂(N)), no match for the O(1) average speed of dictionary lookups.

4.6.1 Iterating over Keys

Iterating over keys of a dictionary is done with the for key in dict: looping construct:

In : for country in capital:

...: city = capital[country]

...: print(f'{country:10s} -> {city}')

...:

USA -> Washington D.C.

Germany -> Berlin

Japan -> Tokyo

France -> Paris

Python (as of version 3.6) iterates over dictionary keys in the order they were inserted, same as Map in JavaScript. In contrast, std::map in C++ iterates over keys in sorted order. (Before v3.6, Python dictionaries behaved more like Perl and JavaScript hashes which can return keys in any order.)

4.6.2 Testing for Key Existence

The for ... in construct iterates over all keys in a dictionary in the order they were inserted. If you get a key from another source, you can test whether or not the key exists in the dictionary with the construct test_key in dict_name :

In : 'USA' in capital

Out: True

In : 'America' in capital

Out: False

Indexing a dictionary by a nonexistent key raises a KeyError :

In [26]: print(capital['America'])

-------------------------------------------------------

KeyError Traceback (most recent call last)

----> 1 print(capital['America'])

KeyError: 'America'

Out: False

As dictionaries are heavily used in Python, KeyError tends to be among the more frequent error messages Python programmers see.

4.6.2.1 get() and .setdefault()

Most algorithms that populate dictionaries need separate logic to deal with the cases of the key being absent and the key being present. Thus, to avoid KeyError, we frequently end up with code that looks like this:

if key in Dict:

perform_a_task(Dict[key])

else:

Dict[key] = 'new data'

Python has two options to simplify dictionary lookups. The first is the .get() method which returns a dict’s value if the given key exists and None if it doesn’t—without throwing a KeyError. Returning to our capital cities dictionary:

In : capital['USA']

Out: 'Washington D.C.'

In : capital.get('USA')

Out: 'Washington D.C.'

In : capital.get('America')

In : capital.get('America') is None

Out: True

Now our four lines are reduced to just

perform_a_task(Dict.get(key))

but the perform_a_task() function has the additional burden to check for a None input and act accordingly.

.setdefault() is similar to .get() in that it returns a dictionary’s value given a key. If the key is missing though, .setdefault() creates that key with a given default value. This is handy for creating counters.⁵ For example, say you want to count the frequency of each character in the string 'Use setdefault to initialize a new dict keys.' We’ll use the list() method to split the string into individual characters and the dict Count to store the number of occurrences of each letter:

In : sentence = 'Use setdefault to initialize a new dict keys.'

In : Counter = {}

In : for character in list(sentence):

...: Counter.setdefault(character, 0) # no op if character exists

...: Counter[character] += 1

In : Counter

Out:

{'U': 1, 's': 3, 'e': 6, ' ': 7, 't': 5,

'd': 2, 'f': 1, 'a': 3, 'u': 1, 'l': 2,

'o': 1, 'i': 5, 'n': 2, 'z': 1, 'w': 1,

'c': 1, 'k': 1, 'y': 1, '.': 1}

4.6.2.2 Key Collision

A key collision refers to the insertion of a key-value pair into a dictionary which already has an entry for that key. If this occurs, the second value replaces the original:

In : D = { 'a' : 10, 'b' : 11, 'c' : 12 }

In : D['a'] = -5 # collision with existing key 'a'

In : D

Out: {'a': -5, 'b': 11, 'c': 12}

Some applications, for example, vote counting, have strict requirements that an entry only appear once. In this case, one must test for the absence of a key before allowing an insert:

In : voted = {}

In : for name in ['George W', 'John A', 'Tom J', 'James M', 'John A']:

...: if name not in voted: # then OK to insert

...: voted[name] = True

...: print(f'OK: {name}')

...: else:

...: print(f'Error: {name} already voted!')

OK: George W

OK: John A

OK: Tom J

OK: James M

Error: John A already voted!

4.6.3 Iterating over Keys, Sorting by Key

Insert order is not always the desired order to iterate through a dictionary. Frequently, one may wish to traverse a dictionary based on ascending or descending sort order of its keys or values. In this case, we have to employ the sorted() function (and optionally pass it the desired comparison operator).

In this example, we sort our country-to-capital city dictionary by the alphabetical order of the keys (i.e., the country names):

In : for country in sorted(capital):

...: city = capital[country]

...: print(f'{country:10s} -> {city}')

...:

France -> Paris

Germany -> Berlin

Japan -> Tokyo

USA -> Washington D.C.

As explained in Section 4.3.21, sorted() takes two optional keyword arguments, reverse, a Boolean, and key, which is assigned to an in-line function known as a lambda. The lambda takes the dictionary’s key as its argument and returns a value that will be used to determine sort order. First, we’ll use reverse to invert the alphabetical sort order:

In : for country in sorted(capital, reverse=True):

...: city = capital[country]

...: print(f'{country:10s} -> {city}')

...:

USA -> Washington D.C.

Japan -> Tokyo

Germany -> Berlin

France -> Paris

Next, we’ll use key to provide an in-line function that determines sort order by the length of the country name from shortest to longest:

In : for country in sorted(capital, key=lambda X: len(X)):

...: city = capital[country]

...: print(f'{country:10s} -> {city}')

...:

USA -> Washington D.C.

Japan -> Tokyo

France -> Paris

Germany -> Berlin

The expression key=lambda X: len(X) bears additional clarification. The letter X is an arbitrary variable that represents the function’s argument, which will be the dictionary keys, for example, 'France'. The lambda’s return value, len(X), is the length of the dictionary key, or the number 5 when X is 'France'. Our case lambda function will cause sorted() to return the dictionary keys from the shortest length, 3 for 'USA', to the longest, 7 for 'Germany'.

4.6.4 Iterating over Keys, Sorting by Value

Lambda functions can just as easily control sorting based on the dictionary values, that is, the city names, instead of just keys. If we wanted to iterate based on the alphabetical order of city names, we simply use a lambda function that returns the string of the city name:

In : for country in sorted(capital, key=lambda X: capital[X]):

...: city = capital[country]

...: print(f'{country:10s} -> {city}')

...:

Germany -> Berlin

France -> Paris

Japan -> Tokyo

USA -> Washington D.C.

And if we want to sort on the reverse order of the length of the city names, we would do

In : for country in sorted(capital, key=lambda X: len(capital[X]), reverse=True):

...: city = capital[country]

...: print(f'{country:10s} -> {city}')

...:

USA -> Washington D.C.

Germany -> Berlin

Japan -> Tokyo

France -> Paris

4.6.4.1 Secondary Sorts

In the preceding result, both 'Tokyo' and 'Paris' have five characters. How should we handle tie breakers? If we want to perform additional sorts in cases where the primary sort has equal values, we’ll need a secondary sort.

Secondary sorts take advantage of the fact that Python’s sorts are stable. This means if there are two records R and S with the same key and R appears before S in the original list, R will appear before S in the sorted list. In other words, the city sort earlier will always show 'Tokyo' before 'Paris' regardless of how many new entries are added to, or removed from, the dictionary. To implement a secondary sort, we work backward by first sorting the dictionary by the secondary condition, then sort that result by the primary condition. Wherever the primary condition has ties, the equally valued items remain in the order they entered the sort, in other words, already sorted by the secondary condition.

Here’s how it looks like in practice. First, we’ll make a list of keys (i.e., country names) sorted by alphabetical value of the city names:

In : countries_sorted_by_capital = sorted(capital, key=lambda X: capital[X])

In : countries_sorted_by_capital

Out: ['Germany', 'France', 'Japan', 'USA']

The only significant property of countries_sorted_by_capital is that 'France' appears before 'Japan' because these countries have capital cities with the same number of characters and that the capital of 'France', 'Paris', appears alphabetically before the capital of 'Japan', 'Tokyo'. The positions of the other countries are irrelevant.

Now we sort the secondary sort results, countries_sorted_by_capital, by our primary criterion—the length of the city name:

In : countries_sorted_by_capital = sorted(capital, key=lambda X: capital[X])

In : for country in sorted(countries_sorted_by_capital,

...: key=lambda X: len(capital[X]), reverse=True):

...: city = capital[country]

...: print(f'{country:10s} -> {city}')

...:

USA -> Washington D.C.

Germany -> Berlin

France -> Paris

Japan -> Tokyo

Finally, we have the result in the sequence we wanted: inverse order of city name length and alphabetical order where city names are equally long.

Tertiary and higher-level sorts work the same way: sort by the least significant factor, then work your way backward to the primary factor.

4.6.5 Tuples As Keys

All dictionary examples so far have used scalar keys. Tuples are, in a sense, multivalued scalars and can also be used as dictionary keys. This enables simple solutions to data relationships that involve multiple inputs.

As an example, if you wanted to keep track of defective pixels on a sensor, you could make a tuple from the i, j coordinates and index your dictionary with the coordinate tuple:

bad_pixel_coords = [ (432,66), (553,17), (846,295) ]

defective_pixel = {}

for ij in bad_pixels: # eg, ij = 553,17

defective_pixel[ ij ] = True

to define

defective_pixel[ 432, 66 ] = True

defective_pixel[ 553, 17 ] = True

defective_pixel[ 846,295 ] = True

You could store the same information in a double-level dictionary, but it would be messier to code and slower to traverse.

4.6.6 List Values

Dictionary values are not limited to scalars; they may be any Python container, including other dictionaries. Dictionaries of lists are a popular combination. Among other things, these can store tree structures—the parent node is the dictionary key, and its list items are child nodes. This tree, for example

can be represented by this dictionary of lists:

tree = {}

tree['M1'] = ['K8', 'U3']

tree['U3'] = ['B9', 'B1', 'Z0']

tree['K8'] = ['R4', 'R5']

tree['B1'] = ['T7', 'S1', 'Y5']

4.7 Structured Data

MATLAB allows one to create structured variables on the fly. Python has several ways to do the same, albeit without MATLAB’s simplicity. The more casual methods use the namedtuple class, imported from the collections module, and the SimpleNamespace class, imported from the types module. The most powerful method uses data classes. These can contain custom methods that operate on the data and contain relationships similar to joined tables in an SQL database.

4.7.1 Method 1: namedtuple

As its name implies, a namedtuple is a type of tuple, meaning its values cannot be changed after their initial assignment. They are ideal for read-only structured variables. While the MATLAB and Python structures contain the same data in the following example, the MATLAB values may be changed:

MATLAB:

Python:

>> Pos.x = 354.8;

>> Pos.y = -28.7;

>> Pos.z = 1.4572e+5;

>> Pos

Pos =

struct with fields :

x: 354.8000

y: -28.7000

z: 1.4572e+05

In : from collections import namedtuple

In : Pt = namedtuple('Coord',

['x', 'y', 'z'])

In : a = Pt(354.8, -28.7, 14570.0)

In : a

Out: Coord(x=354.8, y=-28.7, z=14570.0)

In : a.y = 0

AttributeError Traceback

----> 1 a.y = 0

AttributeError: can't set attribute

4.7.2 Method 2: SimpleNamespace

A SimpleNamespace is a more versatile structured data container than a namedtuple because its values can be changed. The MATLAB code fragment here is identical to the one earlier.

MATLAB:

Python:

>> Pos.x = 354.8;

>> Pos.y = -28.7;

>> Pos.z = 1.4572e+5;

In : from types import SimpleNamespace

In : Pos = SimpleNamespace()

In : Pos.x = 354.8

In : Pos.y = -28.7

In : Pos.z = 1.457e+4

In : Pos

Out: namespace(x=354.8, y=-28.7, z=14570.0)

One can check whether or not a field name exists in the structured variable with the hasattr() function , directly analogous to MATLAB’s isfield() function .

To iterate over the fields in MATLAB, one can use the fieldnames() function. The same is possible in Python, but with less obvious notation. There, one must access the structured variable’s underlying dictionary:

MATLAB:

Python:

>> isfield(Pos, 'x')

>> isfield(Pos, 'w')

fields = fieldnames(Pos);

for i = 1:length(fields)

F = fields{i};

val = Pos.(F);

fprintf(' Pos.%s = %.1f ', F, val);

end

Pos.x = 354.8

Pos.y = -28.7

Pos.z = 145720.0

In : hasattr(Pos, 'x')

Out: True

In : hasattr(Pos, 'w')

Out : False

In : for F in Pos.__dict__:

...: val = Pos.__dict__[F]

...: print(f' Pos.{F} = {val}')

...:

Pos.x = 354.8

Pos.y = -28.7

Pos.z = 14570.0

4.7.3 Method 3: Classes

Conventional Python classes, to be covered in greater detail in Chapter 10, can also serve as data containers although they require a bit more code to set up. For completeness, here is how one would define a regular class to store structured data:

MATLAB:

Python:

classdef Position

properties

x {double}

y {double}

z {double}

end

methods

function obj = Position(x,y,z)

obj.x = x;

obj.y = y;

obj.z = z;

end

>> Pos = Position(354.8, -28.7, 1.457e+4)

Pos = Position with properties:

x = 354.8000

y = -28.7000

z = 14570

In : class Position:

...: def __init__(self, X, Y, Z)

...: self.x = X

...: self.y = Y

...: self.z = Z

In : Pos = Position(354.8, -28.7,

1.457e+4)

In : Pos.x, Pos.y, Pos.z

Out: (354.8, -28.7, 14570.0)

As with SimpleNamespace—and any Python object for that matter—the existence of attributes can be checked with hasattr() and iterated over by accessing the object’s underlying . __dict__ dictionary. See Section 4.7.2 for an example.

The power of using classes as data containers is the ability to add methods that perform value-added computations with the data values. As we’ll see in the next section though, data classes give us a fusion of concise notation to define the data structures and the ability to define methods that operate on the values. Data classes are therefore better choices for storing structured data than conventional classes.

4.7.4 Method 4: Data Classes

Data classes, introduced in Python 3.7, allow one to create structured variables that can include custom methods. In essence, they are a convenience mechanism that defines a class with automatically generated underlying code for the __init__() constructor, __str__() to produce a string representation of the data, and several other methods. Items within data classes have associated types, but, as with type annotations (Section 3.8.5), by default Python will not enforce a type violation.

Type enforcement can be added with the Pydantic module, though (Section 4.7.4.5), to achieve a capability similar to optional variable properties defined in MATLAB classes (Section 10.1). Another difference between MATLAB classes and Python classes, including its data classes, is that MATLAB can explicitly define private methods, while Python cannot (this is covered in greater detail in Section 10.1.1).

We’ll begin with the data class version of our previous example. The MATLAB Position class is the same as defined above.

MATLAB:

Python:

>> Pos = Position(354.8, -28.7, 1.457e+4)

In : from dataclasses import dataclass

In : @dataclass

In : class Position:

...: x: float

...: y: float

...: z: float

In : Pos = Position(354.8, -28.7,

1.457e+4)

In : Pos.x, Pos.y, Pos.z

Out: (354.8, -28.7, 14570.0)

Nothing exciting here; the real fun begins when we add methods to the data class. We’ll begin by adding a function that computes the distance of the point from the origin:

Python:

from dataclasses import dataclass

import numpy as np

@dataclass

class Position:

x: float

y: float

z: float

def mag(self):

return np.sqrt(self.x**2 + self.y**2 + self.z**2)

Now after we define a point, we can compute its distance by calling mag() :

Python:

In : Pos = Position(354.8, -28.7, 1.457e+4)

In : Pos.mag()

Out: 14574.347557609568

4.7.4.1 Field Values

Alternatively , we can make the data class compute the magnitude and save it as another internal variable when the point is first created. This is done by defining the method _post_init_() and an additional attribute whose value is not supplied when the class is created:

Python:

from dataclasses import dataclass, field

import numpy as np

@dataclass

class Position:

x: float

y: float

z: float

R: float = field(init=False)

def __post_init__(self):

self.R = self.mag()

def mag(self):

return np.sqrt(self.x**2 + self.y**2 + self.z**2)

A point object’s attribute R, formally referred to as a field value because it depends on other values, is then defined when we create the point:

Python:

In : Pos = Position(354.8, -28.7, 1.457e+4)

In : Pos.R

Out: 14574.347557609568

Field values are not automatically recomputed when the initial values change. For example, changing the value of Pos.z will not result in an updated Pos.R without explicitly calling Pos.mag(). To achieve such a change, the class variables would need to include a setter method which updates the variables and then calls .mag():

Python:

from dataclasses import dataclass, field

import numpy as np

@dataclass

class Position:

x: float

y: float

z: float

R: float = field(init=False)

def __post_init__(self):

self.R = self.mag()

def mag(self):

return np.sqrt(self.x**2 + self.y**2 + self.z**2)

def set(self, x=None, y=None, z=None):

if x is not None:

self.x = x

if y is not None:

self.y = y

if z is not None:

self.z = z

self.R = self.mag()

By calling .set() instead of modifying the variables directly, we’ll get the behavior we want:

Python:

In : Pos = Position(354.8, -28.7, 1.457e+4)

In : Pos.R

Out: 14574.347557609568

In : Pos.set(x=-1, z=0)

In : Pos.R

Out: 28.717416318325018

4.7.4.2 Relationships Between Dependent Data Classes

The utility of data classes becomes more apparent when data classes are nested, that is, they include variables whose types are also data classes. To explore nested data classes more fully, we’ll use the Python faker⁶ module, explained in more detail in Appendix B, to generate names of people, phone numbers, and names of companies. These will be stored in data classes Person, Phone, and Company, respectively.

Each person can have one or more phones and work at one company, and a company can have one or more employees. If the data were stored in an SQL database, the entity relationship would resemble this diagram:

Before we generate data, we’ll need to explore two more important data class properties: (1) the ability to modify data class properties dynamically and (2) that data class assignments as those for any mutable object⁷ are by reference (ref. Section 4.8).

4.7.4.3 Dynamic Modification of Data Classes

The entity relationship diagram in Section 4.7.4.2 shows that a Company contains a list of Person as its employees, and a Person has a Company as their employer. Python does not permit forward declaration of data classes, so this presents a chicken-and-egg problem: which do we define first, a Person or a Company? Either way, we’ll get an undefined data class error.

Fortunately, Python’s mutability offers an easy solution to this dilemma: we can define either data class first and simply use a placeholder for the forward definition. The placeholder can be replaced later.

Python:

from dataclasses import dataclass, field

from typing import List

@dataclass

class Phone:

type: str

number: str

@dataclass

class Person:

name: str

# employer: Company

employer: None # place-holder, will be overwritten below

salary: float

phones: List[Phone]

city: str

@dataclass

class Company:

name: str

# employees: List[Person] = field(init=False) # throws AttributeError

employees: list = field(default_factory=list)

We’ll use the faker module to populate names, cities, and companies with realistic strings. Here’s a brief demo:

Python:

In : from faker import Faker

In : fake = Faker()

In : fake.name()

Out: 'Tammy Bennett'

In : fake.city()

Out: 'East Jacobchester'

In : fake.company()

Out: 'Lopez-Baker'

We can then make a person entry, make a corporation, and add that person as an employee of the corporation:

Python :

from faker import Faker

from random import random

fake = Faker()

company = None # forward reference; not known yet

salary = 10 + 100*random()

phones = [ Phone( 'mobile', fake.phone_number()),

Phone( 'office', fake.phone_number()), ]

p_1 = Person(fake.name(), company, salary, phones, fake.city())

employees = []

c_1 = Company(fake.company(), employees)

# make person p_1 an employee of company c_1

p_1.employer = c_1 # resolve earlier forward reference

c_1.employees.append(p_1)

Sample values look like this:

Python:

In : p_1

Out: Person(name='Chuck Brown',

employer=Company(name='Williams, Munoz and Green',

employees=[...]),

salary=103.28236642597916,

phones=[Phone(type='mobile', number='533.476.6020'),

Phone(type='office', number='629-596-5652x63457')],

city='Lauramouth')

In : c_1

Out: Company(name='Williams, Munoz and Green',

employees=[Person(name='Chuck Brown', employer=...,

salary=103.28236642597916,

phones=[Phone(type='mobile', number='533.476.6020'),

Phone(type='office', number='629-596-5652x63457')],

city='Lauramouth')])

Note the circular references of the person’s employer details and the company’s employees.

4.7.4.4 Traversing Linked Data Classes

The p_1 and c_1 objects created in Section 4.7.4.3 are linked together: the person is an employee of the company, and the company’s employee list contains the person. As mentioned earlier, the objects only store memory references to each other, not copies of the data. In addition to being memory efficient, changes to either object are reflected immediately in all linked objects—an employee’s office phone number change is seen by the employer as well.

Interlinked data class objects can be viewed as in-memory relational databases, where each data class is a table, each object a row entry, and interlinked references are foreign keys. As with SQL, information across linked data classes can be correlated rapidly. A company phone book could be prepared easily:

Python:

for person in c_1.employees:

for phone in person.phones:

if phone.type == 'office':

print(f'{person.name} {phone.number}

4.7.4.5 Type Validation with Pydantic

The Position data class defined at the beginning of Section 4.7.4 says the three coordinates x, y, and z have type float (a 64-bit floating-point number). What happens if we create a Position with a string for one of these values?

Python:

In : a = Position("banana", 3.4, -5.1)

In :

It is accepted without complaint! This is bad news. Problems arise only when the bogus value appears in a computation:

Python:

In : a.mag()

----------------------------------------------

TypeError Traceback (most recent call last)

<ipython-input-7-e63ba7f2d72b> in <module>

----> 1 a.mag()

<ipython-input-3-0994e6029018> in mag(self)

7 z: float

8 def mag(self):

----> 9 return np.sqrt(self.x**2 + self.y**2 + self.z**2)

TypeError: unsupported operand type(s) for ** or pow() : 'str' and 'int'

Generally, you want to know there’s a problem as soon as bad data is entered, not at some unknowable time in the future when the data is used.

The Pydantic module defines a data class that enforces types. Our Position class looks like this when created with Pydantic:

Python:

from pydantic import BaseModel

class Position(BaseModel):

x: float

y: float

z: float

def mag(self):

return np.sqrt(self.x**2 + self.y**2 + self.z**2)

Now the error is raised when the object is created rather than when the improperly typed data is used:

Python:

In : good = Position(x=22.1, y=3.4, z=-5.1)

In : bad = Position(x = "banana", y=3.4, z=-5.1)

---> bad = Position(x = "banana", y=3.4, z=-5.1)

in pydantic.main.BaseModel.__init__()

ValidationError: 1 validation error for Position

value is not a valid float (type=type_error.float)

4.7.5 Enumerations

Enumerations, or enums, are collections of related constants with descriptive names meant to clarify code. For example, a program that solves various types of partial differential equations might classify them as elliptic, hyperbolic, or parabolic. Rather than assigning numeric or string constants to these equation types, we can create an enumeration with these exact names:

MATLAB:

Python:

classdef EqType

enumeration

Elliptic, ...

Hyperbolic, ...

Parabolic

end

import enum

class EqType(enum.Enum):

Elliptic = 1

Hyperbolic = 2

Parabolic = 3

and subsequently use to the enumerated items like this:

MATLAB:

Python:

if b*b == a*c

eType = EqType.Elliptic;

else if b*b > a*c

eType = EqType.Hyperbolic;

else

eType = EqType.Parabolic;

end

if (eType == EqType.Parabolic)

parsolv(coeff, ...)

if b*b == a*c:

eType = EqType.Elliptic

else if b*b > a*c:

eType = EqType.Hyperbolic

else:

eType = EqType.Parabolic

if eType == EqType.Parabolic:

parsolv(coeff, ...)

Python enumerations are iterables, meaning we can loop over them. Additionally, the string and integer representations of each enumerated item can be found with the item’s .name and .value attributes.

MATLAB does not provide a way to iterate over enumerated items.

Python:

In : for x in EqType:

...: print(x, x.name, x.value)

EqType.Elliptic Elliptic 1

EqType.Hyperbolic Hyperbolic 2

EqType.Parabolic Parabolic 3

4.8 Caveat: “=” Copies a Reference for Nonscalars!

A critical difference between MATLAB’s data containers and Python’s is that the assignment b = a in MATLAB makes a complete copy of a’s contents and puts them in the new variable b. In Python, this is true only for scalars. If a is a list, dictionary, NumPy array, or any other higher-level data container, Python will only copy a reference to a into b. In other words, a and b will point to the same memory address; a and b become two names that refer to the same underlying data. Another way of putting it is that b = a makes b an alias of b. Conversely, changes made to a appear as changes to b as well.

Needless to say, copies of references rather than the entire data structure cause immense frustration for the unaware. Ostensibly simple computations report erroneous results, data appears to have been corrupted, results are not repeatable, and so on.

To duplicate MATLAB’s = behavior and create a new variable b which contains a duplicate copy of everything in a, one must import the copy module and explicitly call one of its specialized methods, either copy.copy() or copy.deepcopy().

Here’s a brief demonstration of the issue. The id() function reports an object’s memory address:

In : a = [1, 2]

In : b = a

In : id(a), id(b)

Out: (139806965581896, 139806965581896)

In : b[0] = 999999

In : b

Out: [999999, 2]

In : a

Out: [999999, 2]

Creating b as a variable with duplicate contents as a would be done like so:

In : from copy import copy

In : a = [1, 2]

In : b = copy(a)

In : id(a), id(b)

Out: (139806060352840, 139806059562504)

In : b[0] = 999999

In : b

Out: [999999, 2]

In : a

Out: [1, 2]

The deepcopy() function from the copy module is needed for data containers that contain other data containers.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 4. Data Containers

Create new playlist

Sign In

Sign Up

4. Data Containers

4.1 NumPy Arrays

4.2 Strings

4.2.1 Strings, Character Arrays, and Byte Arrays

4.2.2 String Operations

4.2.2.1 String Length

4.2.2.2 Append to a String

4.2.2.3 Repeat a String

4.2.2.4 Convert to Upper- or Lowercase

4.2.2.5 Replace Characters

4.2.2.6 Method Chaining

4.2.3 Formatting

4.2.4 Separate a String into Words

4.2.5 Tests on Strings

4.2.5.1 Testing for Equality

4.2.5.2 Check Trailing Characters

4.2.5.3 Check Starting Characters

4.2.5.4 Do Given Characters Appear in a String?

4.2.6 String Searching, Replacing with Regular Expressions

4.2.6.1 Does a String Match a Regex?

4.2.6.2 Match a Regex and Capture Substrings

4.2.6.3 Replace Text Matching a Regex with Different Text

4.2.7 String Templates

4.3 Python Lists and MATLAB Cell Arrays

4.3.1 Initialize an Empty List

4.3.2 Create a List with Given Values

4.3.3 Get the Length of a List

4.3.4 Index a List Item

4.3.5 Extract a Range of Items

4.3.6 Warning—Python Index Ranges Are Not Checked!

4.3.7 Append an Item

4.3.8 Append Another List

4.3.9 Preallocate an Empty List

4.3.10 Insert to the Beginning (or Any Other Position) of a List

4.3.11 Indexing Nested Containers

4.3.12 Membership Test: Does an Item Exist in a List?

4.3.13 Find the Index of an Item

4.3.14 Apply an Operation to All Items (List Comprehension)

4.3.15 Select a Subset of Items Based on a Condition

4.3.16 How Many Times Does an Item Occur?

4.3.17 Remove the First or Last (or Any Intermediate) List Item

4.3.18 Remove an Item by Value

4.3.19 Merging Multiple Lists

4.3.20 Unmerging Combined Lists

4.3.21 Sort a List

4.3.22 Reverse a List

4.4 Python Tuples

4.5 Python Sets and MATLAB Set Operations

4.6 Python Dictionaries and MATLAB Maps

4.6.1 Iterating over Keys

4.6.2 Testing for Key Existence

4.6.2.1 get() and .setdefault()

4.6.2.2 Key Collision

4.6.3 Iterating over Keys, Sorting by Key

4.6.4 Iterating over Keys, Sorting by Value

4.6.4.1 Secondary Sorts

4.6.5 Tuples As Keys

4.6.6 List Values

4.7 Structured Data

4.7.1 Method 1: namedtuple

4.7.2 Method 2: SimpleNamespace

4.7.3 Method 3: Classes

4.7.4 Method 4: Data Classes

4.7.4.1 Field Values

4.7.4.2 Relationships Between Dependent Data Classes

4.7.4.3 Dynamic Modification of Data Classes

4.7.4.4 Traversing Linked Data Classes

4.7.4.5 Type Validation with Pydantic

4.7.5 Enumerations

4.8 Caveat: “=” Copies a Reference for Nonscalars!

Table of Contents for
4. Data Containers