Bash shell basics — lists of pipelines asynchronously

Paul Guerin
6 min readMar 10, 2023

--

The Bash shell is incredibly powerful. And as a shell, it’s capabilities are often under-appreciated.

Functions in the Bash shell are first class citizens, just like in most other languages.

But in a lot of ways, the fundamentals of the Bash shell are pipelines.

Commands, processes, and processors

Here is an example of a command.

# psr = processor that process last executed on
ps -fTo pid,comm,psr,etimes

# also to show all light-weight processes (ie theads)
ps -efL
pstree -thps

# number of processers available
nproc

The command executed, and it also listed the processes in use. In this case, Bash started a new process for the command. Bash also ran the command on a different CPU processor (out of a possible 8 processors on this particular hardware).

We can also string the commands together like this on the same line.

# Note: nothing is passed from one command to the other
# psr = processor that process last executed on
ps -fTo pid,comm,psr,etimes;ps -fTo pid,comm,psr,etimes

In the example, Bash has executed the ps command in another process. Then the command was executed a second time, and another process was created for it. Also another CPU processor executed the command.

A new process means there is the opportunity for a another CPU processor to execute it.

For each command executed, the previous process is terminated, and another is created for the next command.

Pipelines

A pipeline could be just a single command, or a sequence of many commands.

A ‘pipeline’ is a sequence of one or more commands separated by one of
the control operators ‘|’ or ‘|&’.

A common pipeline pattern is to execute a command, then filter, then sort the output of that command.

An example of a pipeline is below:

# pipe the output of one command into the input of another
ps -eo pid,euser,start,etime,args | grep -e root| sort

The syntax of a pipeline can be adjusted too.

The output of the ps command can be piped into pgrep, which will retain the column headings in the output.

# Pipe an output into an input
ps -o pid,euser,start,etime,args --sort $(pgrep -u root)

It may also be better to do the sorting earlier, as part of the ps command, rather than at the end of the pipeline.

Another example.

# stil a pipeline, even though nothing is actually passed from one command to the other
# psr = processor that process last executed on
ps -fTo pid,comm,psr,etimes|ps -fTo pid,comm,psr,etimes

So we have a pipeline, and a new process is being created, and another CPU processor is being used for each process.

For a pipeline, each command is executed with a new process, but the previous ones stay alive for the life of the pipeline.

Lists of pipelines

Just as a pipeline is a sequence of one or more commands; a list is a sequence of one or more pipelines.

A list is a sequence of one or more pipelines separated by one of the operators ;, &, &&, or ||

A simple example of a list using the ‘;’ separator is below:

# execute pipeline sequentially regardless of error
mkdir test; cd test

Using the ‘;’ separator, the 2nd pipeline will execute regardless of the error status of the 1st pipeline.

Can also put in conditions to only run the 2nd pipeline, only if the 1st pipeline didn’t return an error. In this case, we use a ‘&&’ as the separator.

# if the 1st pipeline has no error, then execute the 2nd pipeline.
mkdir ~/test && cd ~/test

The first pipeline needs to be successful before the second pipeline can execute.

Compound commands

A list of pipelines can be executed in two ways.

  • execute in a subshell
  • execute in the current shell

Execution in a subshell is performed using the ( and ) characters and is as follows:

# execute a list of pipelines in a subshell.
(echo 'Hello wally'; echo 'Hello wally world')

# as above
(echo 'Hello wally'
echo 'Hello wally world')

Execution in the current shell is performed using the ‘{space’ and ‘;space}’ characters and is perhaps more conventional:

# execute list of pipelines in the current shell.
# Note: the use of space characters are important
{ echo 'Hello wally'; echo 'Hello wally world'; }

# as above
{ echo 'Hello wally'
echo 'Hello wally world'; }

Any errors in the pipeline, will be reported in the pipestatus environment variable.

# query for execution status (0=no error, !0=error) of the last pipeline
echo $PIPESTATUS

A status of zero means no error occurred in the last pipeline.

A non-zero value will be returned if there is an error.

Another example.

# current shell pipeline
{ ps -fTo pid,comm,psr,etimes|ps -fTo pid,comm,psr,etimes; }

# equivalent to above
ps -fTo pid,comm,psr,etimes|ps -fTo pid,comm,psr,etimes

Same behaviour as before. New processes are created, and they are executed on separate CPU processors.

# subshell pipeline
(ps -fTo pid,comm,psr,etimes|ps -fTo pid,comm,psr,etimes)

As we are using a subshell, Bash spawns a subshell first, then executes the pipeline. New processes are created, and they are executed on separate CPU processors.

For a pipeline, it doesn’t matter if the output of one command doesn’t flow into the input of the next.

For a current shell and subshell pipeline, each command is executed with a new process, but the previous ones stay alive for the life of the pipeline.

Lists of pipelines synchronously

Can also process many lists of pipelines synchronously like the Bash script below:

#!/bin/bash

cm="date; hostname; uname -a"
# execute many lists of pipelines
for server in 192.168.0.1 192.168.0.2 192.168.0.3
do
# synchronous execution of each list of pipelines
ssh root@$server $cm; echo 'pipeline status: '$PIPESTATUS
done

exit

As each pipeline encounters time-outs, then the status is non-zero.

It is also possible to process lists of pipelines asynchronously too.

Lists of pipelines asynchronously

Much quicker than running many lists of pipelines synchronously, is to instead run the lists asynchronously.

Asynchronous execution of a list of pipelines will each will run as a separate job.

Asynchronous lists of pipelines are created using a pair of ‘(‘ and ‘)’ followed by ‘&’. The ‘&’ is used to create a new job for each list.

Then follow up with a ‘wait’ command, to wait for all the jobs to finish before exiting the script.

#!/bin/bash

cm="date; hostname; uname -a"
# execute many lists of pipelines
for server in 192.168.0.1 192.168.0.2 192.168.0.3
do
# asynchronous execution of each list of pipelines
(ssh root@$server $cm; echo 'pipeline status: '$PIPESTATUS)&
done

echo
echo 'wait for completion of the jobs...'
echo

wait

exit

The Bash script completes earlier, as the jobs are running asynchronously.

Reference

https://www.gnu.org/software/bash/manual/bash.html

https://tiswww.case.edu/php/chet/bash/bashref.html

https://google.github.io/styleguide/shellguide.html

--

--

No responses yet