Subshells and parallel processing

We already talked a bit about subshells in the opening chapters of this book; they can be defined as child processes of their main shell. So, a subshell is a command interpreter inside a command interpreter. When does this happen? Well, usually when we run a script, this spawns its own shell and from there executes all the commands listed; but notice this nice detail: an external command, unless invoked using exec, spawns a subprocess, but a builtin doesn't. And this is the reason why the bultins execution time is faster than the execution time for the corresponding external command, as we saw in the previous pages of this book.

Well, what can be useful for a subshell? Let's see a small example that will make everything easier:

#!/bin/bash
echo "This is the main subshell"
(echo "And this is the second" ; for i in {1..10} ; do echo $i ;
done)

Nothing special. We echo in the first subshell spawned by the script, and then open a subshell from inside the subshell and echo the $i variable using a range between 1 and 10:

zarrelli:~$ ./subshell.sh 
This is the main subshell
And this is the second
1
2
3
4
5
6
7
8
9
10

As I just said, there is nothing really special in this script other than the way we called a subshell using (command_1; command_2; command_n).

Whatever is inside the parentheses is executed in a new subshell isolated from the parent shell since; whatever happens inside the subshell is local to this environment:

#!/bin/bash
a=10
echo "The value of a in the main subshell is $a"
(echo "The value of a in the child subshell is $a"; echo "...but
now it changes"...; a=20; echo "and now a is $a")
echo "But coming back to the main subshell, the value of a has not
been altered here since the subshell variables are local, a: $a"

Now, let's run this piece of code:

zarrelli:~$ ./local.sh 
The value of a in the main subshell is 10
The value of a in the child subshell is 10
...but now it changes...
and now a is 20
But coming back to the main subshell, the value of a has not been altered here since the subshell variables are local, a: 10

As we can see from the example, this is a one-way inheritance from the parent to the child, nothing climbs up the ladder. But it is possible to spawn subshell from inside a subshell, so to have a nesting structure, this is nice; but we could lose track of where we are. It's better to have a handy variable such as $BASH_SUBSHELL available:

#!/bin/bash
(
echo "Bash nesting level: $BASH_SUBSHELL. Shell PID: $BASHPID"
(
echo "Bash nesting level: $BASH_SUBSHELL. Shell PID: $BASHPID"
(
echo "Bash nesting level: $BASH_SUBSHELL. Shell PID: $BASHPID"
)
)
)

Firstly, we wrote the code in this fancy way just to highlight the nested structure of the shells; we can use a more compact notation on a production script. Notice the two variables:

  • $BASH_SUBSHELL: This internal variable is available from Bash version 3 and holds the subshell level
  • $BASHPID: This holds the process ID of the shell instance

Let's run the script and have a look at the output:

zarrelli:~$ ./nesting.sh
Bash nesting level: 1. Shell PID: 19787
Bash nesting level: 2. Shell PID: 19788
Bash nesting level: 3. Shell PID: 19789

Well, we have the subshell levels nicely printed along with the PID of each shell instance, and this shows us that they are actually different processes spawned by each parent shell. We could be tempted to use the internal $SHLVL variable to keep track of the shell level, but unfortunately this is not affected by the nested shells as the following example highlights:

echo "Bash level: $BASH_SUBSHELL - $SHLVL" ; (echo "Bash level: $BASH_SUBSHELL - $SHLVL"; (echo "Bash level: $BASH_SUBSHELL - 
$SHLVL"))
Bash level: 0 - 1
Bash level: 1 - 1
Bash level: 2 – 1

Nice, but what happens when we exit from a nested shell? Time for another example:

#!/bin/bash
echo "This is the main subshell"
(
echo "This is the second level subshell";
for i in {1..10}; do if (( i==5 )); then exit; else echo $i; fi;
done
)
echo "Out of the second level subshell but still kicking inside
the first level!"
for i in {1..3}
do echo $i
done

In the lines of code we spawn an inner subshell counting from 1 to 10 and printing to the stdout until we reach 5: in this case we exit the subshell and jump back to the first level. Will the script continue and print the other three numbers? Running it will reveal the answer:

zarrelli:~$ ./exit.sh 
This is the main subshell
This is the second level subshell
1
2
3
4
Out of the second level subshell but still kicking inside the
first level!
1
2
3

Yes, the exit call affected the inner subshell only and the rest of the script kept running on the upper level.

Well, we saw some fancy stuff about subshells, but we can use them for parallel execution, but how? Just as usual, let's start with a script:

#!/bin/bash
(while true
do
:
done)&
(for i in {1..3}
do
echo "$i"
done)

The first thing to notice is the & character whose job is to put in background the commands or the shells it follows. In this example, the first subshell has an infinite loop, and if we do not send it in the background, it will prevent the second subshell to be spawned and its content executed. But let's see what happens when we send it in the background:

./parallel.sh 
1
2
3

So, the second subshell was correctly spawned and the for loop executed, but what happened to the first infinite while loop?

ps -fj
UID PID PPID PGID SID C STIME TTY TIME CMD
zarrelli 17311 1223 17311 17311 0 09:07 pts/0 00:00:01
/bin/bash
zarrelli 21843 1 21842 17311 99 10:46 pts/0 00:00:16
/bin/bash ./parallel.sh
zarrelli 21863 17311 21863 17311 0 10:47 pts/0 00:00:00 ps -fj

Well, it is still there running in memory. You can use & not just for subshells but also for any other command:

zarrelli:~$ ls &
[1] 22064
zarrelli:~$ exit.sh local.sh nesting.sh parallel.sh sub
shell.sh
[1]+ Done ls --color=auto

Do you want the command you issued to run even after you logged off the system? Just run the following command:

nohup command &

It will run in a subshell in the background, and nohup will catch the SIGHUP signal that is sent to all the subshells and processes when the main shell is terminated. This way, the subshell and the related command will not be affected by the terminate signal and will continue its execution.

Going back to subshells, why would you want to send in the background an entire subshell and not single commands or compounds? Think of subshells as containers: tear down a problem in less complex tasks, enclose the latter in subshells, and have them to execute in the background, and you will save time having them executed in parallel.

We just said parallel, but actually Bash does not optimize the execution of commands and script for a multicore architecture. If we want something more core wise, we can install a nice program called parallel. We will not talk much about this program since it is not really Bash related, but it is a nice tool for the reader to explore, a tool core savy:

zarrelli:~$ parallel --number-of-cpus
1
zarrelli:~$ parallel --number-of-cores
4

The basic syntax of parallel is quite easy:

parallel command ::: argument_1 argument_2 argument_n

It is similar to the following example:

zarrelli:~$ parallel echo ::: 1 2 3
1
2
3

Giving more arguments separated by ::: will cause parallel to pass them to the command in all the combinations possible:

zarrelli:~$ parallel echo ::: 1 2 3 ::: A B C
1 A
1 B
1 C
2 A
2 B
2 C
3 A
3 B
3 C

The number of jobs executed here is equal to the number of cores available, but we can modify this value with-j+n to add the n jobs to the cores. Fire parallel with -j0, and it will try to execute as many jobs as possible:

zarrelli:~$ parallel --eta --joblog sleep echo {} ::: 1 2 3 4 5 
10
Computers / CPU cores / Max jobs to run

1:local / 4 / 4Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
ETA: 0s Left: 6 AVG: 0.00s local:4/0/100%/0.0s 1
ETA: 0s Left: 5 AVG: 0.00s local:4/1/100%/0.0s 2
ETA: 0s Left: 4 AVG: 0.00s local:4/2/100%/0.0s 3
ETA: 0s Left: 3 AVG: 0.00s local:3/3/100%/0.0s 4
ETA: 0s Left: 2 AVG: 0.00s local:2/4/100%/0.0s 5
ETA: 0s Left: 1 AVG: 0.00s local:1/5/100%/0.0s 10
ETA: 0s Left: 0 AVG: 0.00s local:0/6/100%/0.0s
zarrelli:~$ parallel -j0 --eta --joblog sleep echo {} ::: 1 2 3
4 5 10
Computers / CPU cores / Max jobs to run
1:local / 4 / 6
Computer:jobs running/jobs completed/%of started jobs/Average se

conds to complete
ETA: 0s Left: 6 AVG: 0.00s local:6/0/100%/0.0s 1
ETA: 0s Left: 5 AVG: 0.00s local:5/1/100%/0.0s 2
ETA: 0s Left: 4 AVG: 0.00s local:4/2/100%/0.0s 3
ETA: 0s Left: 3 AVG: 0.00s local:3/3/100%/0.0s 4
ETA: 0s Left: 2 AVG: 0.00s local:2/4/100%/0.0s 5
ETA: 0s Left: 1 AVG: 0.00s local:1/5/100%/0.0s 10
ETA: 0s Left: 0 AVG: 0.00s local:0/6/100%/0.0s

What can we do with parallel? Well, a lot of tricky stuff, but it is left to the reader to try and experiment with this nice utility; I am confident that a lot of new ideas will arise while tinkering with it.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.84.155