MATLAB and Python have many powerful data containers. MATLAB’s primary containers are the matrix, cell array, and struct, while Python mainstays are lists, dictionaries, and, for the intended audience of this book, NumPy arrays. A fourth key container in the Python landscape is the Pandas dataframe which corresponds most closely to a MATLAB table. Tables and dataframes are covered in Chapter 13. Approximate equivalences of MATLAB and Python containers appear in Table 4-1.
NumPy arrays are covered thoroughly in Chapter 11 and appear frequently in most other chapters. Here, we merely give a cursory introduction to the NumPy module and its primary container, the n-dimensional array, also known as the ndarray.
4.1 NumPy Arrays
Python, accompanied by the core modules in the standard distribution, has little to offer for numeric computation. For this, we need NumPy, the numerics module at the heart of nearly every Python-powered scientific and numeric capability. NumPy arrays are covered thoroughly in Section 11.1; here, we just introduce them with a sneak preview.
but, as mentioned in Section 3.14.1.4, this could break your code in subtle ways.
MATLAB: | Python: |
---|---|
>> A = [1 2; 3 4] A = 1 2 3 4 >> b = ones(2,1) b = 1 1 >> x = A x = -1 1 | In : A = np.array([[1,2],[3,4]]) In : A array([[1,2], [3,4]]) In : b = np.ones((2,)) In : b Out: array([1.,1.]) In : x = np.linalg.solve(A,b) In : x Out: array([-1.,1.]) |
Both MATLAB and ipython have a whos command that lists the names, types, and sizes of variables:
The bulk of this book will be about NumPy arrays and operations on them. For now, though, we’ll turn our attention to other data containers.
4.2 Strings
Both Python and MATLAB have extensive support for creating, modifying, and testing strings. Both also have a different data type for a byte array which is often considered the same as a string. They differ though: a byte array (or character vector in MATLAB) is a sequence of bytes, while a string is a sequence of text characters—and a text character may require several bytes to represent it. An ASCII string maps one to one with a byte array, but a Unicode string storing, say, Vietnamese characters, will not.
4.2.1 Strings, Character Arrays, and Byte Arrays
While both languages have similar concepts of strings, they differ in how and what their more simpler forms, character arrays (MATLAB) and byte arrays (Python), contain. MATLAB char arrays are essentially primitive strings where each character is stored in two bytes. Python byte arrays are a collection of uint8 values that can look like strings if the byte values fall in the range of printable ASCII characters. A MATLAB string is actually a 1x1 cell array containing a 1xN array of chars. You’ll occasionally see MATLAB strings indexed by {1} since this construct returns the underlying char array---handy for numeric indexing of substrings.
MATLAB: | Python: |
---|---|
>> x = 'byte array'; >> y = "a string"; >> class(x) 'char' >> class(y) 'string' | In : x = b'byte array' In : y = 'a string' In : type(x) Out: bytes In : type(y) Out: str |
MATLAB: | Python: |
---|---|
>> x = 'byte array'; >> y = "a string"; >> x_str = string(x); >> b_arr = char(y); | In : x = b'byte array' In : y = 'a string' In : x_str = x.decode() In : b_arr = y.encode() |
MATLAB: | Python: |
---|---|
>> x = 'byte array'; >> uint16(x) 98 121 116 101 32 97 114 114 97 121 | In : x = b'byte array' In : [_ for _ in x] Out: [98, 121, 116, 101, 32, 97, 114, 114, 97, 121] |
A commonly seen error in Python is TypeError: a bytes-like object is required, not 'str'. This happens when attempting to perform a string operation, covered in the next section, using a byte array. The simple fix is to apply the .decode() method on the byte array to turn it into a string.
4.2.2 String Operations
Python strings, like all data containers and functions in Python, are objects. Their attributes and methods can be queried interactively in ipython by adding a period after a string variable, then hitting the <TAB> key :
MATLAB has a similar capability with its methods and methodsview commands, but these are applied to the data type (or “class” in MATLAB terminology) rather than to a variable:
Common string operations are described in greater detail in the following sections.
4.2.2.1 String Length
MATLAB: | Python: |
---|---|
>> str = "string length"; >>length(str{1}) ans = 13 | In : str = "string length" In : len(str) Out: 13 |
4.2.2.2 Append to a String
MATLAB: | Python: |
---|---|
>> A = "cats"; >> B = "dogs"; >> C = A +" "+ B; >> C "cats dogs" | In : A = "cats" In : B = "dogs" In : C = A +" "+ B In : C Out: 'cats dogs' |
MATLAB: | Python: |
---|---|
>> 'a'+'b' 195 | In : 'a'+'b' Out: 'ab' |
4.2.2.3 Repeat a String
MATLAB: | Python: |
---|---|
>> A = "#."; >> repmat(A,1,5) ans = '#.#.#.#.#.' | In : A = "#." In : A * 5 Out: '#.#.#.#.#.' |
4.2.2.4 Convert to Upper- or Lowercase
MATLAB: | Python: |
---|---|
>> A = "The String"; >> upper(A) ans = 'THE STRING' >> lower(A) ans = 'the string' | In : A = "The String" In : A.upper() Out: 'THE STRING' In : A.lower() Out: 'the string' |
4.2.2.5 Replace Characters
MATLAB: | Python: |
---|---|
>> A = "the fox box"; >> replace(A,'ox','it') " "the fit bit" >> A "the fox box" >> A = replace(A,'ox', 'it') "the fit bit" | In : A = "the fox box" In : A.replace('ox', 'it') Out: 'the fit bit' In : A Out: 'the fox box' In : A = A.replace('ox', 'it') In : A Out: 'the fit bit' |
4.2.2.6 Method Chaining
Parsing text often requires multiple clean-up operations: remove commas, replace unwanted text with spaces, convert everything to lowercase, and so on. Multiple operations can be chained in Python where one method call is immediately followed by another.
There are several ways to extract the numbers (notably with regular expressions which will be covered in Section 4.2.6); one way is to simply replace “x:”, “y:”, and “,” with spaces or empty strings. The equivalent of Python’s method chaining is a cumbersome collection of nested function calls in MATLAB:
The .split() method is described in Section 4.2.4.
4.2.3 Formatting
Both MATLAB and Python can use C language–style formatting for strings:
Python additionally supports a convenient feature known as f-strings that allow variables and expressions to be embedded within the format string instead of appearing afterward as arguments. Note the f" prefix on the format string:
The < symbol on the string format <5s means “left justify.” The formatting designations are optional. Without them, the output is
4.2.4 Separate a String into Words
A frequently performed operation for reading input data is splitting a string into an array of words delimited by whitespace, commas, or other characters. Python’s .split() method, like MATLAB’s strsplit() function, will default to splitting the string on whitespace; passing an argument will split on that character or substring.
Outputs from split operations are a cell array in MATLAB and a list in Python. These containers will be described in detail in Section 4.3.
Splitting comma-separated value (.csv) files is such a ubiquitous task that both MATLAB and Python have special methods for this. Chapter 7 has an extensive section on working with .csv files.
4.2.5 Tests on Strings
When working with numeric data, MATLAB’s notation is generally terser than Python’s. The reverse is true for strings.
4.2.5.1 Testing for Equality
MATLAB: | Python: |
---|---|
>> str = "string equality"; >> strcmp(str{1}(1:6), "string") ans = 1 >> str{1}(end-1:end) ans = ty >> strcmp(str{1}(end-1:end),"tY") ans = 0 | In : str = "string equality" In : str[:6] == "string" Out: True In : str[-2:] Out: 'ty' In : str[-2:] == "tY" Out: False |
4.2.5.2 Check Trailing Characters
MATLAB: | Python: |
---|---|
>> fname = "a.csv"; >> endsWith(fname,".csv") 1 | In : fname = "a.csv"; In : fname.endswith('.csv') Out: True |
4.2.5.3 Check Starting Characters
MATLAB: | Python: |
---|---|
>> fname = "a.csv"; >> startsWith(fname,"a") 1 | In : fname = "a.csv" In : fname.startswith('a.') Out: True |
4.2.5.4 Do Given Characters Appear in a String?
MATLAB: | Python: |
---|---|
>> str = 'cat or dog or bird'; >> contains(str,'or') ans = 1 >> contains(str,'and') ans = 0 | In : str = 'cat or dog or bird' In : 'or' in str Out: True In : 'and' in str Out: False |
4.2.6 String Searching, Replacing with Regular Expressions
- 1.
Check whether or not text has the desired pattern
- 2.
Extract text patterns from a string (if they exist) for subsequent use
- 3.
Replace text that matches a pattern with new text
Python has a complete Perl-compatible regex engine, while MATLAB implements only a subset of the Perl regex metacharacters. The underlying mechanisms of invoking the pattern search and extracting results also differ.
4.2.6.1 Does a String Match a Regex?
In the first example, we use a regex to see whether or not a string contains two integers separated by spaces, then either “dog” or “cat”:
4.2.6.2 Match a Regex and Capture Substrings
Portions of a regular expression can be captured for subsequent use by wrapping the portion of interest with parentheses. The 'token' argument to MATLAB’s regexp() function returns the matched portions as a cell array of strings. In Python, the object returned by re.search() has a .group() method which returns the matched portions and a .groups() method which returns a tuple of all of these matches.
A significant difference between MATLAB and Python is that Python can return results from nested captures—that is, from nested parenthetical expressions—but MATLAB can’t. In the following example, we’ll look for numeric year-month-day patterns in an input:
The inner parentheses, qualified by a “2x” multiplier, hold the last match of the multiplied pattern, therefore just -04 instead of -03-04.
4.2.6.3 Replace Text Matching a Regex with Different Text
In this example, we’ll replace either “cat” or “dog” with “fish” followed by a copy of the integer preceding it. The notation g<1> in the Python regular expression is a backreference to the first grouped pattern, that is, the contents of the regular expression caught in the first pair of parentheses.
Backreferences are not supported in MATLAB 2020b, so an additional step is needed to first capture the integer before “cat” or “dog”:
4.2.7 String Templates
String templates are useful for stamping out copies of text that is mostly boilerplate. Examples include simple HTML documents (we’ll see an example of this in Section 7.17.2) and input files for other programs when performing a parameter sweep where just one value changes in each file.
The Python strings module from the standard library has a function, Template(), that returns a template object whose entries can be replaced by calling the object’s _substitute() method. Here’s an example:
I’m unaware of a text templating mechanism for MATLAB.
The Jinja22 template engine offers much more power than the Python standard library’s string.Template(). It offers text generation with loops, conditional expressions, template hierarchies with inheritance, macros, filters, Python code execution, and supports include files.
4.3 Python Lists and MATLAB Cell Arrays
A Python list contains a sequence of arbitrary scalar values and/or containers. A list is created with open and close brackets, [ item1, item2, … ], so it superficially resembles a MATLAB array. However, unlike a MATLAB array, lists may contain different data types. Therefore, Python lists most closely resemble MATLAB cell arrays.
All Python variables (as well as functions and classes) are objects that have functions, or methods, associated with them. We can see the methods available for lists by using ipython’s interactive help:
Further help on any of these methods can be found by adding a question mark after the method’s name:
For completeness, here are the methods that work with MATLAB cell arrays:
The following sections show how to manipulate Python lists and the MATLAB equivalent with cell arrays.
4.3.1 Initialize an Empty List
An empty cell array of a given size can be allocated in MATLAB with the cell() function. If given only one numeric argument, N, it will return an N x N collection of empty cells—not always the desired outcome. To simply make N empty cells, we’ll need to supply a second dimension of 1.
MATLAB: | Python: |
---|---|
>> a = cell(1,3) {0×0 double} {0×0 double} {0×0 double} | In : a = [ None] * 3 In : a Out: [None, None, None] |
4.3.2 Create a List with Given Values
MATLAB: | Python: |
---|---|
>> a = {1,2.2,'a string'} a = 1×3 cell array {[1]} {[2.2000]} {'a string'} | In : a= [1,2.2,'a string'] In : a Out: [1,2.2,'a string'] |
Additional methods exist in both languages to convert other containers into lists. MATLAB has cell(), mat2cell(), and num2cell(), while in Python one can use the list() function or write a list comprehension (described in Section 4.3.14).
4.3.3 Get the Length of a List
MATLAB: | Python: |
---|---|
>> size(a) 1 3 >> n_items = size(a,2) 3 | In : len(a) Out: 3 In : n_items = len(a) In : n_items Out: 3 |
4.3.4 Index a List Item
MATLAB: | Python: |
---|---|
>> a(3) 1×1 cell array {'a string'} >> a{3} 'a string' | In : a[2] Out: 'a string' |
MATLAB: | Python: |
---|---|
>> a{ end} 'a string' | In : a[-1] Out: 'a string' |
Negative indexing has a drawback in that it can mask latent bugs. Say you write code that only accesses list items with zero or positive indices. If the code has a logic error which permits an index to become negative, instead of crashing with an index error like MATLAB, your code will continue to run—and yield bad results.
MATLAB: | Python: |
---|---|
>> a = {1, 2.2, 'a string'}; >> a{4} Index exceeds the number of array elements (3). >> a{-4} Array indices must be positive integers or logical values. | In : a = [1, 2.2, 'a string'] In : a[3] IndexError Traceback ----> 1 a[3] IndexError: list index out of range In : a[-3] Out: 1 In : a[-4] IndexError Traceback ----> 1 a[-4] IndexError: list index out of range |
4.3.5 Extract a Range of Items
MATLAB: | Python: |
---|---|
>> a = num2cell(100:106) a = 1×7 cell array Columns 1 through 7 {[100]} {[101]} {[102]} {[103]} {[104]} {[105]} {[106]} | In : a = list(range(100,107)) In : a Out: [100, 101, 102, 103, 104, 105, 106] |
Note that in MATLAB, a = {100:106} produces a cell array with a single item (a matrix of seven values), which is not what we’re after:
MATLAB: | Python: |
---|---|
>> a{2} = -0.1; >> a{7} = 'cell' a = 1×7 cell array Columns 1 through 7 {[100]} {[-0.1000]} {[102]} {[103]} {[104]} {[105]} {'cell'} | In : a[1] = -0.1 In : a[6] = 'list' In : a Out: [100, -0.1, 102, 103, 104, 105, 'list'] |
Cell arrays and list slices are accessed similarly as the following examples illustrate.
Extract the first three items
Python’s range operator differs a bit from MATLAB’s. In Python, the start index may be omitted if it is 0, and the value for the end index is not part of the returned list. In other words, [:3] returns list items 0, 1, and 2.
MATLAB: | Python: |
---|---|
>> a{1:3} 100 -0.1000 102 >> a(1:3) 1×3 cell array {[100]} {[-0.100]} {[102]} | In : a[:3] Out: [100, -0.1, 102] |
MATLAB: | Python: |
---|---|
>> a{end-3:end} 103 104 105 'cell' | In : a[-4:] Out: [103, 104, 105, 'list'] |
MATLAB: | Python: |
---|---|
>> a(2:3:end) 1×2 cell array {[-0.100]} {[104]} | In : a[1::3] Out: [-0.1, 104] |
4.3.6 Warning—Python Index Ranges Are Not Checked!
Although a single list index raises an IndexError if it exceeds the bounds of the list, index ranges have no such checks.
In Python, out-of-bounds index ranges merely return an empty list; they will not raise an error.
There’s no error! What is going on? Python will raise an error when a list is indexed by a single value outside the index bounds, but silently accepts index ranges which are out of bounds. Unchecked index ranges offer rich opportunities for code errors to pass undetected. They place the responsibility for checking start and end index values on the developer.
MATLAB: | Python: |
---|---|
>> S = "abcdefghijklm"; >> n_chop = 6; >> extractBetween(S,1,n_chop) "abcdef" >> S = "abc"; >> extractBetween(S,1,n_chop) Error using extractBetween Numeric value exceeds the number of characters in element 1. | In : S = "abcdefghijklm" In : n_chop = 6 In : S[:n_chop] Out: 'abcdef' In : S = "abc" In : S[:n_chop] Out: 'abc' |
Another example is splitting a collection into evenly sized sets and not having to bother with leftovers on uneven splits. Here, we group the numbers 1 through 20 into three evenly sized sets:
Obviously , 20 is not evenly divisible by 3, but we don’t have to bother with that detail; the output sets have the desired counts of 7, 7, and 6 members:
Had array bounds been checked, the last iteration would have raised an error since the print statement attempts to access a nonexistent 21st element. Instead, Python just returns nothing for the missing item.
The equivalent output can be produced with MATLAB code that caps the ending index at each iteration:
4.3.7 Append an Item
Items can be added to MATLAB cell arrays simply by introducing a new index. The new index can be any integer value; it need not be an increment of the last index in the cell array. If there is a gap of indices, the skipped terms are created and populated with an empty matrix.
MATLAB: | Python: |
---|---|
>> a{8} = 3.14 % equivalent to >> a(8) = {3.14} a = 1×8 cell array Columns 1 through 8 {[100]} {[101]} {[102]} {[103]} {[104]} {[105]} {[106]} {[3.1400]} | In : a.append(3.14) In : a Out: [100, -0.1, 102, 103, 104, 105, 'list', 3.14] |
MATLAB: | Python: |
---|---|
>> a{10} = 2.71 a = 1×10 cell array Columns 1 through 10 {[100]} {[101]} {[102]} {[103]} {[104]} {[105]} {[106]} {[3.1400]} {0x0 double} {[2.7100]} | In : a.append(None) In : a Out: [100, -0.1, 102, 103, 104, 105, 'list', 3.14, None] In : a.append(2.71) In : a Out: [100, -0.1, 102, 103, 104, 105, 'list', 3.14, None, 2.71] |
4.3.8 Append Another List
MATLAB: | Python: |
---|---|
>> a = {1, 'two'}; >> b = {3, 4.4, 5}; >> c = horzcat(a,b) c = 1x5 cell array Columns 1 through 5 {[1]} {'two'} {[3]} {[4.4000]} {[5]} | In : a = [1, 'two'] In : b = [3, 4.4, 5] In : a.extend(b) In : a Out: [1, 'two', 3, 4.4, 5] |
4.3.9 Preallocate an Empty List
MATLAB: | Python: |
---|---|
>> a = cell(4,1) a = { [1,1] = [](0x0) [2,1] = [](0x0) [3,1] = [](0x0) [4,1] = [](0x0) } >> a{3} = -7.2 a = { [1,1] = [](0x0) [2,1] = [](0x0) [3,1] = -7.2000 [4,1] = [](0x0) } | In : a = 4*[None] In : a Out: [None, None, None, None] In : a[2] = -7.2 In : a Out: [None, None, -7.2, None] In : a = 4*[[]] In : a Out: [[], [], [], []] In : a[2] = -7.2 In : a Out: [[], [], -7.2, []] |
Note that two dimensions were passed to MATLAB’s cell(). If we were to call cell(4) instead of cell(4,1), the result would be a 4 × 4 cell array.
4.3.10 Insert to the Beginning (or Any Other Position) of a List
MATLAB: | Python: |
---|---|
>> a {[100]} {[-0.100]} {[102]} {[103]} {[104]} {[105]} {'list'} >> a = horzcat('new',a) {'new'} {[100]} {[-0.100]} {[102]} {[103]} {[104]} {[105]} {'list'} | In : a Out: [100, -0.1, 102, 103, 104, 105, 'list'] In : a.insert(0, 'new') In : a Out: ['new', 100, -0.1, 102, 103, 104, 105, 'list'] |
4.3.11 Indexing Nested Containers
Entries within nested cell arrays are indexed in MATLAB with both braces and parentheses; the braces index into the cell array, and parentheses index into the item within the cell.
MATLAB: | Python: |
---|---|
>> a = {1, {'inner', 'cell'}, -3.3} a = 1x3 cell array {[1]} {1×2 cell} {[-3.3000]} >> a{2} 1x2 cell array {'inner'} {'cell'} >> a{2}(1) 1x1 cell array {'inner'} | In : a = [1, ['inner', 'list'], -3.3] In : a Out: [1, ['inner', 'list'], -3.3] In : a[1] Out: ['inner', 'list'] In : a[1][0] Out: 'inner' |
4.3.12 Membership Test: Does an Item Exist in a List?
We saw at the beginning of this section that the ismember() function works for MATLAB cell arrays. I’ve not had luck using ismember() with mixed-type data in MATLAB 2020b though. (If all entries are numeric, the cell array can be converted to a matrix after which the find() function can be used.) Instead, I use this small function to check if an item exists in a cell array:
MATLAB: | Python: |
---|---|
>> a = {'hi', 102, 3.3}; >> cell_has(a, 102) 1 >> cell_has(a, 27) 0 | In : a = ['hi', 102, 3.3] In : 102 in a Out: True In : 27 in a Out: False |
Returning briefly to ismember() in MATLAB, the six attempts at checking if 102 is in a all yield the same error:
4.3.13 Find the Index of an Item
MATLAB: | Python: |
---|---|
>> a = num2cell(100:106) a = 1×7 cell array Columns 1 through 7 {[100]} {[101]} {[102]} {[103]} {[104]} {[105]} {[106]} >> find([a{:}] == 102) 3 | In : a = list(range(100,107)) In : a Out: [100, 101, 102, 103, 104, 105, 106] In : a.index(102) Out: 2 |
MATLAB: | Python: |
---|---|
>> find([a{:}] == 27) | In : a.index(27) ValueError: 27 is not in list |
MATLAB: | Python: |
---|---|
>> a{7} = 'string'; >> a a = 1×7 cell array Columns 1 through 7 {[100]} {[101]} {[102]} {[103]} {[104]} {[105]} {'string'} >> find([a{:}] == 102) 3 >> find([a{:}] == 'string') Matrix dimensions must agree. | In : a[6] = 'string' In : a Out: [100, 101, 102, 103, 104, 105, 'string'] In : a.index(102) Out: 2 In : a.index('string') Out: 6 |
4.3.14 Apply an Operation to All Items (List Comprehension)
MATLAB: | Python: |
---|---|
>> a = {0.3, 0.4, 0.5}; >> cellfun(@(x)(x.ˆ3),a) 0.0270 0.0640 0.1250 | In : a = [0.3, 0.4, 0.5] In : [x**3 for x in a] Out: [0.027, 0.064, 0.125] |
MATLAB: | Python: |
---|---|
>> a = {100, 101, 102,... 103, 'string'}; >> cellfun(@(y) "0x" + ... string(y), a) 1×5 string array "0x100" "0x101" "0x102" "0x103" "0xstring" | In : a = [100, 101, 102, 103, 'string'] In : [f'0x{str(y)}' for y in a] Out: ['0x100', '0x101', '0x102', '0x103', '0xstring'] |
4.3.15 Select a Subset of Items Based on a Condition
MATLAB: | Python: |
---|---|
>> a = {'Select','a','Subset',... 'of','Items','Based',... 'on','a','Condition'} >> i = isstrprop(cellfun(@(x)... x(1), a), 'upper'); >> a(i) {'Select'} {'Subset'} {'Items'} {'Based'} {'Condition'} | In : a = ['Select', 'a', 'Subset', 'of', 'Items', 'Based', 'on', 'a', 'Condition'] In : [x for x in a if x[0].isupper()] Out: ['Select', 'Subset', 'Items', 'Based', 'Condition'] |
4.3.16 How Many Times Does an Item Occur?
MATLAB: | Python: |
---|---|
>> a = {'To',2,'To','To','u'}; >> sum(cellfun(@(x) string(x)... == 'To', a)) 3 >> sum(cellfun(@(x) string(x)... == 'From', a)) 0 | In : a = ['To',2,'To','To','u'] In : a.count('To') Out: 3 In : a.count('From') Out: 0 |
4.3.17 Remove the First or Last (or Any Intermediate) List Item
MATLAB: | Python: |
---|---|
>> a = {21, 22, 23, 24, 25}; >> b = a(end); >> a = a(1:end-1) {[21]} {[22]} {[23]} {[24]} >> b {[25]} >> a = {21, 22, 23, 24, 25}; >> a(1) {[21]} >> a = a(2:end) {[22]} {[23]} {[24]} {[25]} | In : a = [21, 22, 23, 24, 25] In : b = a.pop() In : a Out: [21, 22, 23, 24] In : b Out: 25 In : a = [21, 22, 23, 24, 25] In : a.pop(0) Out: 21 In : a Out: [22, 23, 24, 25] |
MATLAB: | Python: |
---|---|
>> a = {21,22,23,24,25}; >> i = 3; >> a(1) {[23]} >> a = horzcat(a(1:i-1),a(i+1:end)) {[21]} {[22]} {[24]} {[25]} | In : a = [21,22,23,24,25] In : i = 2 In : a.pop(i) Out: 23 In : a Out: [21, 22, 24, 25] |
4.3.18 Remove an Item by Value
If one knows the index of an item to remove from a list, the .pop(index) method described earlier works nicely. But what if you only know the value of the item to remove? In this case, the .remove() method is useful. Note that only the first occurrence of the matched value is removed. A ValueError exception is raised if the requested value doesn’t appear.
MATLAB: | Python: |
---|---|
>> a = {22,21,'a',22,21}; >> i = cellfun(@(x) x == 22, a); >> a = a(˜i) {[21]} {'a'} {[21]} >> i = cellfun(@(x) x == -4, a); >> a = a(˜i) {[21]} {'a'} {[21]} | In : a = [22,21,'a',22,21] In : a.remove(22) In : a Out: [21, 'a', 22, 21] In : a.remove(22) Out: [21, 'a', 21] In : a.remove(-4) ----------------------------- ValueError ----> 1 a.remove(-4) ValueError: list.remove(x): x not in list |
4.3.19 Merging Multiple Lists
Related data items in separate lists must sometimes be grouped into individual pairwise (or, more generally, n-wise) items. For example, say you have a list of letters, ‘A’, ‘B’, ‘C’, …, and a corresponding list of those letters’ ASCII values, 65, 66, 67, …, and you want to merge these two lists into a single new list of letter and ASCII value pairs, [ (‘A’, 65), (‘B, 66), … ]. MATLAB allows one to create a new cell array by stacking existing cells, while Python has a function, zip(), which combines the lists. (Imagine a zipper joining two sections of fabric.)
MATLAB : | Python: |
---|---|
>> Letter = {'A','B','C'} Letter = { [1,1] = A [1,2] = B [1,3] = C } >> ASCII = {65,66,67} ASCII = { [1,1] = 65 [1,2] = 66 [1,3] = 67 } >> both={Letter;ASCII} both = { [1,1] = { [1,1] = A [1,2] = B [1,3] = C } [2,1] = { [1,1] = 65 [1,2] = 66 [1,3] = 67 } } >> both{2}{3} ans = 67 | In : Letter = ['A','B','C'] In : Letter Out: ['A', 'B', 'C'] In : ASCII = [65, 66, 67] In : ASCII Out: [65, 66, 67] In : both = zip(Letter,ASCII) In : both Out: <zip at 0x7fc4500c4050> In : list(both) Out: [('A', 65), ('B', 66), ('C', 67)] In : both[2][1] ------------------------------ TypeError Traceback ----> 1 both[2][1] TypeError: 'zip' object is not subscriptable In : both = list(both) In : both[2][1] Out: 67 |
zip() is frequently seen in for loops that need to step through multiple lists in lockstep:
4.3.20 Unmerging Combined Lists
MATLAB: | Python: |
---|---|
>> a = both{1,:} a = { [1,1] = A [1,2] = B [1,3] = C } >> b = both{2,:} b = { [1,1] = 65 [1,2] = 66 [1,3] = 67 } | In : a, b = zip(*both) In : a Out: ['A', 'B', 'C'] In : b Out: [65, 66, 67] |
Prefixing a list or numeric array with an asterisk, as with *both earlier, means “expand the terms.” In other words, if x = [9, 'b'], then x is a single item, a list. *x, however, means two separate terms, 9 and 'b'. The asterisk can be thought of as removing the outer container.
4.3.21 Sort a List
MATLAB: | Python: |
---|---|
a = {31, -127, 28, 45} {[31]} {[-127]} {[28]} {[45]} >> sort(cell2mat(a)) -127 28 31 45 | In : a = [31, -127, 28, 45] In : sorted(a) Out: [-127, 28, 31, 45] In : a Out: [31, -127, 28, 45] In : a.sort() In : a Out: [-127, 28, 31, 45] |
4.3.22 Reverse a List
Finally, the sequence of a Python list can be flipped with either the reversed() function or with the list’s .reverse() method ; using the .reverse() method alters the list in-place. MATLAB can reverse terms of vector or cell array x with fliplr(x) or flip(x,2).
MATLAB: | Python: |
---|---|
>> a {[31]} {[-127]} {[28]} {[45]} >> fliplr(a) {[45]} {[28]} {[-127]} {[31]} | In : a Out: [31, -127, 28, 45] In : reversed(a) Out: <list_reverseiterator> In : list(reversed(a)) Out: [45, 28, -127, 31] In : a Out: [31, -127, 28, 45] In : a.reverse() In : a Out: [45, 28, -127, 31] |
4.4 Python Tuples
Python tuples closely resemble Python lists—both can contain a collection of items and can be indexed numerically. The primary difference is that a tuple of scalar variables is unchangeable3 after it has been created; think of a tuple as a constant with multiple values. This immutable property gives tuples a critical advantage over lists and sets as it lets tuples act as keys to dictionaries. The use of tuples as dictionary keys is explored in Section 4.6.5.
Not all tuples are “hashable” (meaning they can be dictionary keys), though. Tuples made with variables that are lists, dictionaries, or NumPy arrays can change since the tuple only stores references to these variables; the values in the underlying list/dict/array can still change. A tuple is hashable only if all its member items are hashable—and that rules out tuples that contain references to containers whose contents may change.
MATLAB has no comparable “frozen collection” data container.
Tuples are created by assigning a variable to comma-separated items or by calling the tuple() function with an iterable. Even a single item followed by a comma qualifies as a tuple:
Stray commas create tuples! This can lead to mysterious errors far downstream from where the tuple was mistakenly created. As an example, say you write a function that computes a numeric value but the function ends with return X, instead of the intended return X. Later, another function scales this returned value by 4. If X were 1.1, the first function returns the tuple (1.1,). The second function multiplies this by 4, producing (1.1, 1.1, 1.1, 1.1) instead of 4.4. The bad value continues to propagate until an illegal operation, division, for instance, is attempted with the tuple.
Tuples are often seen wrapped in parentheses. This is in fact mandatory when passing a tuple as an argument to a function or assigning multiple tuples on one line:
Many NumPy array creation functions—for example, np.ones(), shown earlier in Section 4.1 where it is invoked as np.ones((2,))—expect the first argument to be a tuple defining the array’s dimensions. The double set of parentheses often puzzles new Python programmers. The parentheses are needed to separate the dimension, which is a single variable, from subsequent arguments:
4.5 Python Sets and MATLAB Set Operations
MATLAB and Python can both perform set operations—unions, intersections, and so on—but only Python has a data container specifically for storing sets. Sets are created with braces or by calling the set() function on an iterable:
MATLAB: | Python: |
---|---|
>> Fib = [ 0 1 1 2 3 5]; >> unique(Fib) 0 1 2 3 5 | In : Fib = [ 0, 1, 1, 2, 3, 5] In : set(Fib) Out: {0, 1, 2, 3, 5} |
Set members can be iterated over, but cannot be indexed numerically. Cast the set to a list if you need to index terms. Beware, though, that sets do not maintain sequence; iteration over a set and casting a set to a list puts the items in any order.
MATLAB: | Python: |
---|---|
>> a = [54 43 32 23]; >> ismember(43, a) 1 >> ismember(44, a) 0 >> all(ismember([32, 43], a)) 1 | In : a = {54, 43, 32, 23} In : 43 in a Out: True In : 44 in a Out: False In : a.issuperset({32, 43}) Out: True |
Set operations
Operation | MATLAB | Python | Explanation |
---|---|---|---|
Union | union(A,B) | A | B | All members of A and B |
Intersection | intersect(A,B) | A & B | Members that are in both A and B |
Difference | setdiff(A,B) | A – B | Members of A after members of B have been removed from A |
Symmetric difference | setxor(A,B) | A ˆ B | Members which are only in A or only in B |
Subset test | all(ismember(A,B)) | A.issubset(B) | True if all members of A are in B |
Superset test | all(ismember(B,A)) | A.issuperset(B) | True if all members of B are in A |
Disjointed test | ˜any(ismember(A,B)) | A.isdisjoint(B) | True if A and B have no members in common |
4.6 Python Dictionaries and MATLAB Maps
Dictionaries (also known as associative arrays or hashes in Perl; hashes or maps in JavaScript; and maps in C++, Java, and MATLAB) allow one to create a relationship between two datasets known as keys and values. Notationally, dictionaries look like lists that can be indexed by arbitrary scalars—strings, for example—instead of just integers. In addition to the convenience they provide developers, dictionaries are also performant. Both inserting new key-value pairs into and retrieving values from dictionaries are O(1) operations on average.
Oddly, despite its power, MATLAB programmers rarely use its Map data container.
Dictionaries are best explained by example. Say we need to look up a country’s capital city. We could store the country-to-capital city relationship in a dictionary like this:
Retrieving a country’s capital is then just a matter of using the country’s name as the subscript to the dictionary:
Dictionaries beat lists for storing relationships not only because of the key/value binding but because they permit much faster data lookup. Imagine storing the country/city data in a list and having to return a city given a country. Even if the list were sorted by country name, the fastest search would be O(log2(N)), no match for the O(1) average speed of dictionary lookups.
4.6.1 Iterating over Keys
Python (as of version 3.6) iterates over dictionary keys in the order they were inserted, same as Map in JavaScript. In contrast, std::map in C++ iterates over keys in sorted order. (Before v3.6, Python dictionaries behaved more like Perl and JavaScript hashes which can return keys in any order.)
4.6.2 Testing for Key Existence
As dictionaries are heavily used in Python, KeyError tends to be among the more frequent error messages Python programmers see.
4.6.2.1 get() and .setdefault()
but the perform_a_task() function has the additional burden to check for a None input and act accordingly.
4.6.2.2 Key Collision
4.6.3 Iterating over Keys, Sorting by Key
Insert order is not always the desired order to iterate through a dictionary. Frequently, one may wish to traverse a dictionary based on ascending or descending sort order of its keys or values. In this case, we have to employ the sorted() function (and optionally pass it the desired comparison operator).
The expression key=lambda X: len(X) bears additional clarification. The letter X is an arbitrary variable that represents the function’s argument, which will be the dictionary keys, for example, 'France'. The lambda’s return value, len(X), is the length of the dictionary key, or the number 5 when X is 'France'. Our case lambda function will cause sorted() to return the dictionary keys from the shortest length, 3 for 'USA', to the longest, 7 for 'Germany'.
4.6.4 Iterating over Keys, Sorting by Value
4.6.4.1 Secondary Sorts
In the preceding result, both 'Tokyo' and 'Paris' have five characters. How should we handle tie breakers? If we want to perform additional sorts in cases where the primary sort has equal values, we’ll need a secondary sort.
Secondary sorts take advantage of the fact that Python’s sorts are stable. This means if there are two records R and S with the same key and R appears before S in the original list, R will appear before S in the sorted list. In other words, the city sort earlier will always show 'Tokyo' before 'Paris' regardless of how many new entries are added to, or removed from, the dictionary. To implement a secondary sort, we work backward by first sorting the dictionary by the secondary condition, then sort that result by the primary condition. Wherever the primary condition has ties, the equally valued items remain in the order they entered the sort, in other words, already sorted by the secondary condition.
The only significant property of countries_sorted_by_capital is that 'France' appears before 'Japan' because these countries have capital cities with the same number of characters and that the capital of 'France', 'Paris', appears alphabetically before the capital of 'Japan', 'Tokyo'. The positions of the other countries are irrelevant.
Finally, we have the result in the sequence we wanted: inverse order of city name length and alphabetical order where city names are equally long.
Tertiary and higher-level sorts work the same way: sort by the least significant factor, then work your way backward to the primary factor.
4.6.5 Tuples As Keys
All dictionary examples so far have used scalar keys. Tuples are, in a sense, multivalued scalars and can also be used as dictionary keys. This enables simple solutions to data relationships that involve multiple inputs.
You could store the same information in a double-level dictionary, but it would be messier to code and slower to traverse.
4.6.6 List Values
Dictionary values are not limited to scalars; they may be any Python container, including other dictionaries. Dictionaries of lists are a popular combination. Among other things, these can store tree structures—the parent node is the dictionary key, and its list items are child nodes. This tree, for example
4.7 Structured Data
MATLAB allows one to create structured variables on the fly. Python has several ways to do the same, albeit without MATLAB’s simplicity. The more casual methods use the namedtuple class, imported from the collections module, and the SimpleNamespace class, imported from the types module. The most powerful method uses data classes. These can contain custom methods that operate on the data and contain relationships similar to joined tables in an SQL database.
4.7.1 Method 1: namedtuple
MATLAB: | Python: |
---|---|
>> Pos.x = 354.8; >> Pos.y = -28.7; >> Pos.z = 1.4572e+5; >> Pos Pos = struct with fields : x: 354.8000 y: -28.7000 z: 1.4572e+05 | In : from collections import namedtuple In : Pt = namedtuple('Coord', ['x', 'y', 'z']) In : a = Pt(354.8, -28.7, 14570.0) In : a Out: Coord(x=354.8, y=-28.7, z=14570.0) In : a.y = 0 AttributeError Traceback ----> 1 a.y = 0 AttributeError: can't set attribute |
4.7.2 Method 2: SimpleNamespace
MATLAB: | Python: |
---|---|
>> Pos.x = 354.8; >> Pos.y = -28.7; >> Pos.z = 1.4572e+5; | In : from types import SimpleNamespace In : Pos = SimpleNamespace() In : Pos.x = 354.8 In : Pos.y = -28.7 In : Pos.z = 1.457e+4 In : Pos Out: namespace(x=354.8, y=-28.7, z=14570.0) |
One can check whether or not a field name exists in the structured variable with the hasattr() function , directly analogous to MATLAB’s isfield() function .
MATLAB: | Python: |
---|---|
>> isfield(Pos, 'x') 1 >> isfield(Pos, 'w') 0 fields = fieldnames(Pos); for i = 1:length(fields) F = fields{i}; val = Pos.(F); fprintf(' Pos.%s = %.1f ', F, val); end Pos.x = 354.8 Pos.y = -28.7 Pos.z = 145720.0 | In : hasattr(Pos, 'x') Out: True In : hasattr(Pos, 'w') Out : False In : for F in Pos.__dict__: ...: val = Pos.__dict__[F] ...: print(f' Pos.{F} = {val}') ...: Pos.x = 354.8 Pos.y = -28.7 Pos.z = 14570.0 |
4.7.3 Method 3: Classes
MATLAB: | Python: |
---|---|
classdef Position properties x {double} y {double} z {double} end methods function obj = Position(x,y,z) obj.x = x; obj.y = y; obj.z = z; end end end >> Pos = Position(354.8, -28.7, 1.457e+4) Pos = Position with properties: x = 354.8000 y = -28.7000 z = 14570 | In : class Position: ...: def __init__(self, X, Y, Z) ...: self.x = X ...: self.y = Y ...: self.z = Z In : Pos = Position(354.8, -28.7, 1.457e+4) In : Pos.x, Pos.y, Pos.z Out: (354.8, -28.7, 14570.0) |
As with SimpleNamespace—and any Python object for that matter—the existence of attributes can be checked with hasattr() and iterated over by accessing the object’s underlying . __dict__ dictionary. See Section 4.7.2 for an example.
The power of using classes as data containers is the ability to add methods that perform value-added computations with the data values. As we’ll see in the next section though, data classes give us a fusion of concise notation to define the data structures and the ability to define methods that operate on the values. Data classes are therefore better choices for storing structured data than conventional classes.
4.7.4 Method 4: Data Classes
Data classes, introduced in Python 3.7, allow one to create structured variables that can include custom methods. In essence, they are a convenience mechanism that defines a class with automatically generated underlying code for the __init__() constructor, __str__() to produce a string representation of the data, and several other methods. Items within data classes have associated types, but, as with type annotations (Section 3.8.5), by default Python will not enforce a type violation.
Type enforcement can be added with the Pydantic module, though (Section 4.7.4.5), to achieve a capability similar to optional variable properties defined in MATLAB classes (Section 10.1). Another difference between MATLAB classes and Python classes, including its data classes, is that MATLAB can explicitly define private methods, while Python cannot (this is covered in greater detail in Section 10.1.1).
MATLAB: | Python: |
---|---|
>> Pos = Position(354.8, -28.7, 1.457e+4) | In : from dataclasses import dataclass In : @dataclass In : class Position: ...: x: float ...: y: float ...: z: float In : Pos = Position(354.8, -28.7, 1.457e+4) In : Pos.x, Pos.y, Pos.z Out: (354.8, -28.7, 14570.0) |
Nothing exciting here; the real fun begins when we add methods to the data class. We’ll begin by adding a function that computes the distance of the point from the origin:
Now after we define a point, we can compute its distance by calling mag() :
4.7.4.1 Field Values
Alternatively , we can make the data class compute the magnitude and save it as another internal variable when the point is first created. This is done by defining the method _post_init_() and an additional attribute whose value is not supplied when the class is created:
A point object’s attribute R, formally referred to as a field value because it depends on other values, is then defined when we create the point:
Field values are not automatically recomputed when the initial values change. For example, changing the value of Pos.z will not result in an updated Pos.R without explicitly calling Pos.mag(). To achieve such a change, the class variables would need to include a setter method which updates the variables and then calls .mag():
By calling .set() instead of modifying the variables directly, we’ll get the behavior we want:
4.7.4.2 Relationships Between Dependent Data Classes
The utility of data classes becomes more apparent when data classes are nested, that is, they include variables whose types are also data classes. To explore nested data classes more fully, we’ll use the Python faker6 module, explained in more detail in Appendix B, to generate names of people, phone numbers, and names of companies. These will be stored in data classes Person, Phone, and Company, respectively.
Each person can have one or more phones and work at one company, and a company can have one or more employees. If the data were stored in an SQL database, the entity relationship would resemble this diagram:
Before we generate data, we’ll need to explore two more important data class properties: (1) the ability to modify data class properties dynamically and (2) that data class assignments as those for any mutable object7 are by reference (ref. Section 4.8).
4.7.4.3 Dynamic Modification of Data Classes
The entity relationship diagram in Section 4.7.4.2 shows that a Company contains a list of Person as its employees, and a Person has a Company as their employer. Python does not permit forward declaration of data classes, so this presents a chicken-and-egg problem: which do we define first, a Person or a Company? Either way, we’ll get an undefined data class error.
Fortunately, Python’s mutability offers an easy solution to this dilemma: we can define either data class first and simply use a placeholder for the forward definition. The placeholder can be replaced later.
We’ll use the faker module to populate names, cities, and companies with realistic strings. Here’s a brief demo:
We can then make a person entry, make a corporation, and add that person as an employee of the corporation:
Sample values look like this:
Note the circular references of the person’s employer details and the company’s employees.
4.7.4.4 Traversing Linked Data Classes
The p_1 and c_1 objects created in Section 4.7.4.3 are linked together: the person is an employee of the company, and the company’s employee list contains the person. As mentioned earlier, the objects only store memory references to each other, not copies of the data. In addition to being memory efficient, changes to either object are reflected immediately in all linked objects—an employee’s office phone number change is seen by the employer as well.
Interlinked data class objects can be viewed as in-memory relational databases, where each data class is a table, each object a row entry, and interlinked references are foreign keys. As with SQL, information across linked data classes can be correlated rapidly. A company phone book could be prepared easily:
4.7.4.5 Type Validation with Pydantic
The Position data class defined at the beginning of Section 4.7.4 says the three coordinates x, y, and z have type float (a 64-bit floating-point number). What happens if we create a Position with a string for one of these values?
It is accepted without complaint! This is bad news. Problems arise only when the bogus value appears in a computation:
Generally, you want to know there’s a problem as soon as bad data is entered, not at some unknowable time in the future when the data is used.
The Pydantic module defines a data class that enforces types. Our Position class looks like this when created with Pydantic:
Now the error is raised when the object is created rather than when the improperly typed data is used:
4.7.5 Enumerations
MATLAB: | Python: |
---|---|
classdef EqType enumeration Elliptic, ... Hyperbolic, ... Parabolic end end | import enum class EqType(enum.Enum): Elliptic = 1 Hyperbolic = 2 Parabolic = 3 |
MATLAB: | Python: |
---|---|
if b*b == a*c eType = EqType.Elliptic; else if b*b > a*c eType = EqType.Hyperbolic; else eType = EqType.Parabolic; end if (eType == EqType.Parabolic) parsolv(coeff, ...) | if b*b == a*c: eType = EqType.Elliptic else if b*b > a*c: eType = EqType.Hyperbolic else: eType = EqType.Parabolic if eType == EqType.Parabolic: parsolv(coeff, ...) |
Python enumerations are iterables, meaning we can loop over them. Additionally, the string and integer representations of each enumerated item can be found with the item’s .name and .value attributes.
MATLAB does not provide a way to iterate over enumerated items.
4.8 Caveat: “=” Copies a Reference for Nonscalars!
A critical difference between MATLAB’s data containers and Python’s is that the assignment b = a in MATLAB makes a complete copy of a’s contents and puts them in the new variable b. In Python, this is true only for scalars. If a is a list, dictionary, NumPy array, or any other higher-level data container, Python will only copy a reference to a into b. In other words, a and b will point to the same memory address; a and b become two names that refer to the same underlying data. Another way of putting it is that b = a makes b an alias of b. Conversely, changes made to a appear as changes to b as well.
Needless to say, copies of references rather than the entire data structure cause immense frustration for the unaware. Ostensibly simple computations report erroneous results, data appears to have been corrupted, results are not repeatable, and so on.
To duplicate MATLAB’s = behavior and create a new variable b which contains a duplicate copy of everything in a, one must import the copy module and explicitly call one of its specialized methods, either copy.copy() or copy.deepcopy().
The deepcopy() function from the copy module is needed for data containers that contain other data containers.