Chapter 4. List and Hash Handling

Computers, as we know, are very good at counting and organizing data. We can use code to count and organize data by using a data structure and the building blocks for those are arrays. Bash has had arrays since the beginning and added “associative arrays” in version 4.0. You will run into hard-to-read code for bash arrays out there, partly because bash has a lot of history and backwards compatibility to maintain, as we mentioned earlier, but also because some developers tend to over-complicate things. Arrays are actually not that hard to implement and you can write about them clearly with a little thought.

As a refresher, in computer science and programming, arrays are variables containing multiple elements that are indexed or referred to by an integer. Or in other words, an array is a variable containing a list instead of a scalar or single value. An associative array is a type of list that is indexed by a string instead of an integer. So it’s a list of key-value pairs, that basically forms a dictionary or look-up table, where internally the key is hashed to form a memory location. More or less.

The bash documentation uses the terms “array” and “associative arrays,” but depending on your background, “lists” and “hashes,” or possibly “dictionaries” or “dicts,” may be more familiar and they are certainly easier to type and say. The bash documentation also use “subscript” where other people might say “index.” We usually follow the bash docs for consistency, but for these we’re going to use the more common and understandable “list,” “hash,” and “subscript” terms.

Even though lists (arrays) came first, hashes (associative arrays) are slightly simpler because there is never any question about supplying the index (subscript), because it’s required. The integer index of a list may be implied and while it’s not really complicated, there are operations that work for them that make no sense for a hash.

As the bash reference says, Bash provides one-dimensional indexed and associative array variables. While it’s possible to create really ugly multidimensional structures, doing so in bash will probably end in tears. If you really need to do that, consider doing it in a different language.

Watch out for bash version with hashes

You really need to be careful about your bash version here. As we just said, bash didn’t get associative arrays until version 4.0 and it took a couple of versions to sort out some details, like allowing $list[-1] to refer to the last integer element (v4.3), instead of needing mylist[${#mylist[*]}-1]. Ouch.

As we said in earlier content, watch out for stock bash on a Mac, it’s quite old. You’ll find newer versions in MacPorts, Homebrew, or Fink.

Also, arrays (lists and hashes) are not specified by POSIX, so if portability beyond bash is a concern you’ll need to be very careful using them because they might not work.

Commonalities

Lists and hashes are very similar in bash, so we’re going to start with what they have in common, then get to where they are different. In fact, you can treat lists as a subset of hashes that simply happen to have ordered integer indexes. We don’t think you should do that, but you can.

Lists are inherently ordered while hashes are inherently unordered, so there are operations like shift or push that only make sense for an ordered list. On the other hand you would never need to sort the keys in an ordered list, but you might for a hash.

Accidental Assignment

An accidental assignment without a subscript will affect element zero, so myarray=foo results in creating or overwriting $myarray[0] with foo, even if it’s a hash!

From the bash docs:

If the subscript is @ or *, the word expands to all members of the array name. These subscripts differ only when the word appears within double quotes. If the word is double-quoted, ${name[*]} expands to a single word with the value of each array member separated by the first character of the IFS variable, and ${name[@]} expands each element of name to a separate word.

That’s a mouthful isn’t it? But it’s important because, as we’ll show, if you get it wrong you can hurt yourself. We’ve used printf "%q" with a pipe (|) delimiter in the example code below to show “shell quoted” strings (technically “words”) in a way that is visible in the output on the screen or on this page. The quoting rules are actually the same as we cover in Chapter 2, just in the context of a list or hash.

Beware of unset

We use unset below, and there’s a subtle catch. If you happen to have a file with the same name as a variable, globbing 1 can hurt you and you can clobber unexpected things. To avoid that it’s best to quote your variable. It’s even safer to use -v to force unset to treat your argument as a shell variable, like unset -v 'list'.

Lists

As we said, arrays, also known as lists, are variables containing multiple elements that are indexed or referred to by an integer.

In bash they start from zero and may be declared using declare -a, local -a, readonly -a or by just assigning to a new variable like mylist[0]=foo or mylist=() (empty list). Once a variable is declared as a list, a simple assignment like mylist+=(bar) is the same as a push, but note the + and () both of which are critical.

Table 4-1. Sample bash List
Element Value

mylist[0]

foo

mylist[1]

bar

mylist[2]

baz

The common operations on an array or list are:

  • Declare a variable as a list

  • Assign one or more values to it

  • If treating it as a stack (think plates in a cafeteria; FIFO: first on, first off, or first in, first out)

    • push

    • pop

  • Display (dump) all values for debugging or reuse

  • Reference one or all values (for, or for each)

  • Reference a sub-set (slice) of values

  • Delete one or more values

  • Delete the entire list

Rather than talk about all of that, we’re going to show you instead, and you can just pick out the idioms you need when you need them:

Example 4-1. Bash List Example: Code
#!/usr/bin/env bash
# lists.sh: bash list example code
# Original Author & date: _bash Idioms_ 2022
# bash Idioms filename: examples/ch07/lists.sh
#_________________________________________________________________________

# Declare a list                                                         1
# declare -a mylist   # Can do this, or `local -a` or `readonly -a` or:
mylist[0]='foo'       # This both declares and assigns to mylist[0]

# Push or assign, note the +=                                            2
###mylist=(bar)               # Would overwrite mylist[0]
mylist+=(bar)              # mylist[1]
mylist+=(baz)              # mylist[2]
mylist+=(one two)          # mylist[3] AND mylist[4]
mylist+=("three four")     # mylist[5]

# OR
# mylist=(bar baz one two "three four")  # Note "foo" is already there

# Display or dump the values                                             3
echo -e "
The element count is: ${#mylist[@]} or ${#mylist[*]}"

echo -e "
The length of element [5] is: ${#mylist[5]}"

echo -e "
Dump or list:"
declare -p mylist
echo -n      "${mylist[@]}   = " ; printf "%q|"  ${mylist[@]}
echo -en   "
${mylist[*]}   = " ; printf "%q|"  ${mylist[*]}
echo -en "
"${mylist[@]}" = " ; printf "%q|" "${mylist[@]}"
echo -en "
"${mylist[*]}" = " ; printf "%q|" "${mylist[*]}"
echo -e "	# But this is broken!"  # Previous line is bad and no newline
# See `help printf` or chapter 6, we need this to show the correct words:
# %q	quote the argument in a way that can be reused as shell input

# Iterate over the values                                                4
echo -e "
foreach "${!mylist[@]}":"
for element in "${!mylist[@]}"; do
    echo -e "	Element: $element; value: ${mylist[$element]}"
done

echo -e "
But don't do this: ${mylist[*]}"
for element in ${mylist[*]}; do
    echo -e "	Element: $element; value: ${mylist[$element]}"
done

# Handle slices (sub-sets) of the list, shift and pop                    5
echo -e "
Start from element 3 and show a slice of 2 elements:"
printf "%q|" "${mylist[@]:3:2}"
echo '' # No newline in above

echo -e "
Shift FIRST element [0] (dumped before and after):"
declare -p mylist                    # Display before
mylist=("${mylist[@]:1}")            # First element, needs quotes
#mylist=("${mylist[@]:$count}")      # First #count elements
declare -p mylist                    # Display after

echo -e "
Pop LAST element (dumped before and after):"
declare -p mylist
unset -v 'mylist[-1]'                # Bash v4.3+
#unset -v "mylist[${#mylist[*]}-1]"  # Older
declare -p mylist

# Delete slices or the entire list                                       6
echo -e "
Delete element 2 using unset (dumped before and after):"
declare -p mylist
unset -v 'mylist[2]'
declare -p mylist

# Delete the entire list
unset -v mylist
1

Declare a variable as an array (we say that instead of “list” here because the flag is -a)

2

Assign one or more values to it

3

Display (dump) all values for debugging or reuse

4

Iterate over the values

5

Handle slices (sub-sets) of the list, shift and pop

6

Delete slices or the entire list

It looks like this when you run it:

Example 4-2. Bash List Example: Output
The element count is: 6 or 6

The length of element [5] is: 10

Dump or list:
declare -a mylist=([0]="foo" [1]="bar" [2]="baz" [3]="one" [4]="two" [5]="three four")
${mylist[@]}   = foo|bar|baz|one|two|three|four|
${mylist[*]}   = foo|bar|baz|one|two|three|four|
"${mylist[@]}" = foo|bar|baz|one|two|three four|
"${mylist[*]}" = foo bar baz one two three four|	# But this is broken!

foreach "${!mylist[@]}":
	Element: 0; value: foo
	Element: 1; value: bar
	Element: 2; value: baz
	Element: 3; value: one
	Element: 4; value: two
	Element: 5; value: three four

But don't do this: ${mylist[*]}
	Element: foo; value: foo
	Element: bar; value: foo
	Element: baz; value: foo
	Element: one; value: foo
	Element: two; value: foo
	Element: three; value: foo
	Element: four; value: foo

Start from element 3 and show a slice of 2 elements:
one|two|

Shift FIRST element [0] (dumped before and after):
declare -a mylist=([0]="foo" [1]="bar" [2]="baz" [3]="one" [4]="two" [5]="three four")
declare -a mylist=([0]="bar" [1]="baz" [2]="one" [3]="two" [4]="three four")

Pop LAST element (dumped before and after):
declare -a mylist=([0]="bar" [1]="baz" [2]="one" [3]="two" [4]="three four")
declare -a mylist=([0]="bar" [1]="baz" [2]="one" [3]="two")

Delete element 2 using unset (dumped before and after):
declare -a mylist=([0]="bar" [1]="baz" [2]="one" [3]="two")
declare -a mylist=([0]="bar" [1]="baz" [3]="two")

Hashes

Also known as hashes, dictionaries, or dicts, associative arrays are lists where the index is an arbitrary string instead of an integer. They are amazingly handy for counting or “uniqueing” (that is, ignoring or removing duplicate) strings, among other things.

Unlike lists, these must be declared using declare -A, local -A, or readonly -A and the subscript is always required.

Table 4-2. Sample bash Hash
Element Value

myhash[oof]

foo

myhash[rab]

bar

myhash[zab]

baz

The common operations on an associative array or hash or dict are:

  • Declare a variable as an associative array (we say that instead of “hash” here because the flag is -A)

  • Assign one or more values to it

  • Display (dump) all values for debugging or reuse

  • Reference one or all values (for, or for each)

  • Reference a specific value (lookup)

  • Delete one or more values

  • Delete the entire hash

Again, rather than talk about all of that, we’re just going to show you so pick out the idioms you need:

Example 4-3. Bash Hash Example: Code
#!/usr/bin/env bash
# hashes.sh: bash Hash example code
# Original Author & date: _bash Idioms_ 2022
# bash Idioms filename: examples/ch07/hashes.sh
#_________________________________________________________________________

# Declare a hash                                                         1
declare -A myhash   # MUST do this, or `local -A` or `readonly -A`

# Assign to it, note no "+"                                              2
###myhash=(bar)               # Would create or overwrite myhash[0]
myhash[a]='foo'
myhash[b]='bar'
myhash[c]='baz'
myhash[d]='one two'        # Different than our list example
myhash[e]='three four'

# OR
#myhash=( [a]=foo [b]=bar [c]=baz [d]="one two" [e]="three four")

# Display or dump the details and values                                 3
echo -e "
The key count is: ${#myhash[@]} or ${#myhash[*]}"

echo -e "
The length of the value of key [e] is: ${#myhash[e]}"

echo -e "
Dump or list:"
declare -p myhash
echo -n      "${myhash[@]}   = " ; printf "%q|"  ${myhash[@]}
echo -en   "
${myhash[*]}   = " ; printf "%q|"  ${myhash[*]}
echo -en "
"${myhash[@]}" = " ; printf "%q|" "${myhash[@]}"
echo -en "
"${myhash[*]}" = " ; printf "%q|" "${myhash[*]}"
echo '' # No newline in above
# See `help printf` or chapter 6, we need this to show the correct words:
# %q	quote the argument in a way that can be reused as shell input

# Iterate over the keys and values                                       4
echo -e "
foreach "${!myhash[@]}":"
for key in "${!myhash[@]}"; do
    echo -e "	Key: $key; value: ${myhash[$key]}"
done

echo -e "
But don't do this: ${myhash[*]}"
for key in ${myhash[*]}; do
    echo -e "	Key: $key; value: ${myhash[$key]}"
done

# Handle slices (sub-sets) of the hash                                   5
echo -e "
Start from element 3 and show a slice of 2 elements:"
printf "%q|" "${myhash[@]:3:2}"
echo '' # No newline in above

#echo -e "
Shift FIRST key [0]:" = makes no sense in a hash!
#echo -e "
Pop LAST key:"        = makes no sense in a hash!

# Delete keys or the entire hash                                         6
echo -e "
Delete key c using unset (dumped before and after):"
declare -p myhash
unset -v 'myhash[c]'
declare -p myhash

# Delete the entire hash
unset -v myhash
1

Declare a hash

2

Assign to a hash

3

Display or dump the details and values

4

Iterate over the keys and values

5

Handle slices (sub-sets) of the hash

6

Delete elements or the entire hash

That looks like this when you run it:

Example 4-4. Bash Hash Example: Output
The key count is: 5 or 5

The length of the value of key [e] is: 10

Dump or list:
declare -A myhash=([a]="foo" [b]="bar" [c]="baz" [d]="one two" [e]="three four" )
${myhash[@]}   = foo|bar|baz|one|two|three|four|
${myhash[*]}   = foo|bar|baz|one|two|three|four|
"${myhash[@]}" = foo|bar|baz|one two|three four|
"${myhash[*]}" = foo bar baz one two three four|

foreach "${!myhash[@]}":
	Key: a; value: foo
	Key: b; value: bar
	Key: c; value: baz
	Key: d; value: one two
	Key: e; value: three four

But don't do this: ${myhash[*]}
	Key: foo; value:
	Key: bar; value:
	Key: baz; value:
	Key: one; value:
	Key: two; value:
	Key: three; value:
	Key: four; value:

Start from element 3 and show a slice of 2 elements:
baz|one two|

Delete key c using unset (dumped before and after):
declare -A myhash=([a]="foo" [b]="bar" [c]="baz" [d]="one two" [e]="three four" )
declare -A myhash=([a]="foo" [b]="bar" [d]="one two" [e]="three four" )

A simple word count example

As we said, one of the most common uses of a hash is to count and/or “uniq” items, so here is a simple but concrete example.

Example 4-5. Bash word count example: Code
#!/usr/bin/env bash
# word-count-example.sh: More examples for bash lists and hashes, and $RANDOM
# Original Author & date: _bash Idioms_ 2022
# bash Idioms filename: examples/ch07/word-count-example.sh
#_________________________________________________________________________
# See also: `man uniq`

WORD_FILE='/tmp/words.txt'
> $WORD_FILE                                                           1
trap "rm -f $WORD_FILE" ABRT EXIT HUP INT QUIT TERM

declare -A myhash                                                      2

echo "Creating & reading random word list in: $WORD_FILE"

# Create a list of words to use for the hash example
mylist=(foo bar baz one two three four)

# Loop, and randomly pick elements out of the lips
range="${#mylist[@]}"                                                  3
for ((i=0; i<35; i++)); do
    random_element="$(( $RANDOM % $range ))"                           4
    echo "${mylist[$random_element]}" >> $WORD_FILE                    5
done

# Read the word list into a hash
while read line; do                                                    6
    (( myhash[$line]++ ))                                              7
done < $WORD_FILE                                                      8


echo -e "
Unique words from: $WORD_FILE"                              9
for key in "${!myhash[@]}"; do
    echo "$key"
done | sort

echo -e "
Word counts, ordered by word, from: $WORD_FILE"             10
for key in "${!myhash[@]}"; do
    printf "%s	%d
" $key ${myhash[$key]}
done | sort

echo -e "
Word counts, ordered by count, from: $WORD_FILE"            11
for key in "${!myhash[@]}"; do
    printf "%s	%d
" $key ${myhash[$key]}
done | sort -k2,2n
1

We’ll create a temporary file and set a trap (Chapter 9) to clean it up.

2

We have to declare -A the hash as a bash associative array (again, we say that instead of “hash” here because of the flag).

3

Get the count of elements, or the range for the random number we want.

4

Use the bash $RANDOM variable to pick a random list element (Chapter 4).

5

Echo the random word into the temp file. We used three lines (<3>, <4>, and <5>) to do something we could do in one line like echo "${mylist[$$RANDOM % ${#mylist[@]}]}" >> $WORD_FILE but we think that one line would be a lot harder to re-read six months from now.

6

Read the file we just created, note the location of the file name after the done keyword <8>.

7

Increment the “key” value in hash for the word we saw in the line in the file.

8

Note the location of the file name after the done keyword, from the loop in <6>.

9

Iterate over the keys to display a list of the words, without duplicates, and without using the uniq external command. Note the sort command we did use after the done keyword.

10

Iterate over the keys again, but this time display the “value” of the count as well.

11

Iterate over the keys one last time to display the count again, but this time sorted the second field as a number (sort -k2,2n).

Example 4-6. Bash word count example: Output
Creating & reading random word list in: /tmp/words.txt

Unique words from: /tmp/words.txt
bar
baz
foo
four
one
three
two

Word counts, ordered by word, from: /tmp/words.txt
bar	7
baz	6
foo	4
four	3
one	5
three	4
two	6

Word counts, ordered by count, from: /tmp/words.txt
four	3
foo	4
three	4
one	5
baz	6
two	6
bar	7

Style and Readability

When you get the data structure correct, the rest of the code practically writes itself. When you get the data structure wrong, everything is a struggle. Bash has the building blocks for simple data structures, and once you get used to a little extra punctuation, they’re not that hard to use and read. Just remember you almost always want [@] and not [*], and refer back to your cheat sheet examples when you need to.

Summary

In this chapter we’ve demystified bash array (list and hash) handling and showed idiomatic solutions for the common use-cases.

See also:

1 interpolating and expanding wildcard patterns, see https://en.wikipedia.org/wiki/Glob_(programming)

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset