Shell scripting

Introduction

Instead of typing all the UNIX commands we need to perform one after the other, we can save them all in a file (a "script") and execute them all at once. Recall from the UNIX and Linux Chapter that the bash shell (or terminal) is a text command processor that interfaces with the Operating System. The bash shell provides a computer language that can be used to build scripts (AKA shell scripts) that can be run through the terminal.

What shell scripts are good for

It is possible to write reasonably sophisticated programs with shell scripting, but the bash language is not featured to the extent that it can replace a "proper" language like C, Python, or R. However, you will find that shell scripting is necessary. This is because as such, as you saw in the previous chapter, UNIX has an incredibly powerful set of tools that can be used through the bash terminal. Shell scripts can allow you to automate the usage of these commands and create your own, simple utility tools/scripts/programs for tasks such as backups, converting file formats, handling & manipulating files and directories). This enables you to perform many everyday tasks on your computer without having to invoke another language that might require installation or updating.

Your first shell script

Let's write our first shell script.

Some conventions and syntax rules

  • By convention, Unix shell variables should be named in UPPERCASE
  • Also, to create more complex variable names, use snake case (for example "VAR_NAME")
  • There should be no spaces around the = when assigning these variables; : MY_VAR=value would work, but MY_VAR = value wouldn't, because then the shell assumes that MY_VAR must be the name of a command and tries to execute it (with = value as arguments).

$\star$ Write and save a file called boilerplate.sh in CMEECourseWork/week1/code, and add the following script to it (type it in your code editor):

#!/bin/sh
# Author: Your Name your.login@imperial.ac.uk
# Script: boilerplate.sh
# Desc: simple boilerplate for shell scripts
# Arguments: none
# Date: Oct 2019

echo -e "\nThis is a shell script! \n"

#exit

The .sh extension is not necessary, but useful for you and your programming IDE (e.g., Visual Studio Code, Emacs, etc) to identifying the file type.

  • The first line is a "shebang" (or sha-bang or hashbang or pound-bang or hash-exclam or hash-pling! – Wikipedia). It can also can be written as #!/bin/bash (assuming you are using the bash shell). It tells the bash interpreter that this is a bash script and that it should be interpreted and run as such, and be executed by /bin/sh.
  • The hash marks in the following lines tell the interpreter that it should ignore the lines following them (that's how you put in script documentation (who wrote the script and when, what the script does, etc.) and comments on particular line of script.
  • The -e flag to echo
  • Note that there is a commented out exit command at the end of the script. Uncommenting it will not change the behavior of the script, but will allow you to generate a error code, and if the command is inserted in the middle of the script, to stop the code at that point. To find out more, see this and this in particular.
{tip}
`#!/bin/sh` is the standard location of the Bourne shell (`sh`) on most Unix systems. If you're using GNU/Linux (e.g., Ubuntu), `/bin/sh` is normally a symbolic link to bash (or, sometimes, `[dash]`](https://blog.cloudware.bg/en/dash-vs-bash-shell/)).

Special characters

In shell scripts, there are certain, "special" characters that must be properly "escaped" to avoid interpretation by the shell. Some of these you already saw in the UNIX Chapter; for example, in the bash challenge command find . -type f -exec ls -s {} \; | sort -n | head -10, the character ; had to be escaped with a \ to avoid being interpreted as a special character. There is a list of these in the UNIX chapter, and additional ones will be introduced here.

Next, let's run your first shell script.

Running shell scripts

There are two ways to run a shell script:

  1. Call the bash interpreter to run the file:
bash myscript.sh

(You can also use sh myscript.sh, but it may give a slightly different output.)

This is the right way if the script is does something specific in a given project.

{note}
Bash (bash) is one of many available (yet the most commonly used) Unix shells. Bash stands for "`B`ourne `A`gain SHell",and is an improvement of the original Bourne shell (`sh`). Basically `bash` is `sh`, with more features and nicer (more intuitive, compact) syntax. Most inbuilt UNIX commands or your own scripts will work the same, but at times with subtle differences in output.
{tip}
**Mac Users**: your default shell might not be `bash`, `zsh`. Usually, running a shell script or command with `bash` and `zsh` will give you an identical processing and output. The commands you learned for `bash` will also work in `zsh` although they may give somewhat different output.
  1. Make the script executable and execute it:
chmod +x myscript.sh
./myscript.sh # the ./ is needed

Use this second approach for a script that does something generic, and is likely to be reused again and again (Can you think of examples?)

The generic scripts of type (2) can be saved in username/bin/, and made easily accessible by telling UNIX to look in /home/bin for specific scripts. To this end, you need to add bin to the directory paths that linux searches in for executables. For this you need to set the $PATH environmental variable: a list of directories (separated by colons) that tells the shell which ones to search for executable files (more on environmental variables below).

First, check which directories are already in $PATH:

echo $PATH

Then check if you already have a bin directory:

find /home/ -maxdepth 3 -name 'bin' -type d
{tip}
**Mac Users**: on Macs you may not need to search `/home/`, but just `/`

Note the maxdepth 3 directive. You don't want to search in every possible directory in your UNIX tree (under home)! If you see no bin directory (e.g., you might find .local/bin), then create one:

mkdir ~/.local/bin # in ".local" to keep it to only current user

Then, add it to the $PATH:

export "PATH=$PATH:$HOME/.local/bin"

This change will not persist after you have rebooted your computer. To make it persistent,

  • For Bash, you need to add export PATH=$PATH:$HOME/.local/bin, to the appropriate file that will be read when your shell launches. There are a few different files where you can set the variable name:
    • ~/.bashrc
    • ~/.profile
    • ~/.bash_profile

Check if these files exist, and then add the path specification command (export PATH=$PATH:$HOME/.local/bin in this case) to any of them, but usually ~/.bashrc is a good choice. Then log out and in again, or run source ~/.bashrc (if it was indeed .bashrc that you edited).

  • For other shells, you need to find the appropriate file by reading that shell's documentation. In particular, on current Mac OS versions, which now use the zsh shell, it will be ~/.zshrc.
{note}
If you have two executable files sharing the same name located in two different directories, the shell will run the file that is in the directory that comes first in the paths listed in `$PATH`.

Now run your first shell script.

cd to your code directory, and run it :

In [1]:
cd ../code
bash boilerplate.sh
This is a shell script!

I have specified the relative path ../code assuming that you are in some other directory in your current week (sandbox, results or data).

(02-ShellScripting-Variables)=

Variables in shell scripts

You will need to handle and manipulate variables (AKA parameters) inside shell scripts to truly exploit the powerful features of the bash (shell) language.

{note}
At the most fundamental level, a "variable" in any programming language or environment is a *named* section (portion, chunk) of the computer's memory which can be assigned values, read and manipulated.

Shell scripts have two types of variables.

Special Variables

These are set by the shell, and typically cannot have values assigned to them (cannot be modified). They contain useful or necessary information needed for the script to run. These include:

  • Environmental variables: These contain information about the system (e.g., $PATH, which you saw above), are available system-wide (so you can invoke them directly in the commandline, outside a shell script), and are available to (or "inherited by") all new processes and shells generated ("spawned") by a bash script (AKA a "child" process or shell).
  • Special internal variables: These exist only in the environment of a particular execution of the shell script. These will not be available any more once the script has finished running, unless you explicitly export them.
{tip}
To see a list of all current environmental variables, you can use `env` in the commandline.

Here are some key special internal variables in shell scripts :

Variable Description
$0 The filename (basename) of the current script, including any extension
$n ($1...$9) Here n is an integer corresponding to the position of an argument (the first argument is $1, the second is $2, etc).
$# The number of arguments (parameters) supplied to a script (the script was "called" with)
$@ All the arguments are individually printed. For example, if a script receives two arguments, $@ is equivalent to $1 $2

This is not an exhaustive list, but the important ones to remember in basic shell scripting.

Assigned Variables

These are assigned manually by the user. These are present within the current instance of the shell only and are not available any child processes spawned started by the script unless they are explicitly exported.

In general, assigned variables in the bash language are analogous to those in any other programming language (e.g., Python): they can be a number, a character, a string of characters, or boolean (true/false). There are three ways to assign values to such variables (note lack of spaces!):

  1. Explicit declaration: MY_VAR=myvalue
  2. Reading from the user input (the script will wait for the value to be provided): read MY_VAR
  3. Command substitution: MY_VAR=$(command) (the variable is the output of some command); e.g., MY_VAR=$( (ls | wc -l) )
{tip}
The command substitution of the type MY_VAR=$(command) is one type of "shell expansion". There are several types of shell expansions, which you can learn about [here](https://www.gnu.org/software/bash/manual/html_node/Shell-Expansions.html). Along with command substitution, [shell parameter expansion](https://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html) is particularly important to learn about in shell scripting.

Some examples of variables

Here is an example illustrating the different types of shell variables (and assignments):

#!/bin/sh

## Illustrates the use of variables 

# Special variables

echo "This script was called with $# parameters"
echo "The script's name is $0"
echo "The arguments are $@"
echo "The first argument is $1"
echo "The second argument is $2"

# Assigned Variables; Explicit declaration:
MY_VAR='some string' 
echo 'the current value of the variable is:' $MY_VAR
echo
echo 'Please enter a new string'
read MY_VAR
echo
echo 'the current value of the variable is:' $MY_VAR
echo

## Assigned Variables; Reading (multiple values) from user input:
echo 'Enter two numbers separated by space(s)'
read a b
echo
echo 'you entered' $a 'and' $b '; Their sum is:'

## Assigned Variables; Command substitution
MY_SUM=$(expr $a + $b)
echo $MY_SUM

$\star$ Save this as a single variables.sh script.

$\star$ Now run this script with any arguments:

bash variables.sh

$\star$ And compare the output when run with two arguments:

bash variables.sh 1 two

And also type into another script file the following (save as MyExampleScript.sh) and run it:

#!/bin/sh

MSG1="Hello"
MSG2=$USER
echo "$MSG1 $MSG2"
echo "Hello $USER"
echo

This introduces you to the $USER (same as $USERNAME) environmental variable.

A useful shell-scripting example

Let's write a shell script to transform comma-separated files (csv) to tab-separated files and vice-versa. This can be handy — for example, in certain computer languages, it is much easier to read tab or space separated files than csv (e.g., C)

To do this, in the bash we can use tr (abbreviation of translate or transliterate), which deletes or substitute characters. Here are some examples.

In [2]:
echo "Remove    excess      spaces." | tr -s " "
Remove excess spaces.
In [3]:
echo "remove all the a's" | tr -d "a"
remove ll the 's
In [4]:
echo "set to uppercase" | tr [:lower:] [:upper:]
SET TO UPPERCASE
In [5]:
echo "10.00 only numbers 1.33" | tr -d [:alpha:] | tr -s " " ","
10.00,1.33

Now write a shell script to substitute all tabs with commas called tabtocsv.sh:

#!/bin/sh
# Author: Your name you.login@imperial.ac.uk
# Script: tabtocsv.sh
# Description: substitute the tabs in the files with commas
#
# Saves the output into a .csv file
# Arguments: 1 -> tab delimited file
# Date: Oct 2019

echo "Creating a comma delimited version of $1 ..."
cat $1 | tr -s "\t" "," >> $1.csv
echo "Done!"
exit

Now test it (note where the output file gets saved and why). First create a text file with tab-separated text:

In [6]:
echo -e "test \t\t test" >> ../sandbox/test.txt # again, note the relative path!

Now run your script on it

In [7]:
bash tabtocsv.sh ../sandbox/test.txt
Creating a comma delimited version of ../sandbox/test.txt ...
Done!

Note that

  • $1 is the way a shell script defines a placeholder for a variable (in this case the filename). See next section for more on variable names in shell scripts.

  • The new file gets saved in the same location as the original (Why is that?)

  • The file got saved with a .txt.csv extension. That's not very nice. Later you will get an opportunity to fix this!

Some more examples

Here are a few more illustrative examples (test each one out, save in week1/code/ with the given name):

Count lines in a file

Save this as CountLines.sh:

#!/bin/bash

NumLines=`wc -l < $1`
echo "The file $1 has $NumLines lines"
echo

The < redirects the contents of the file to the stdin (standard input) of the command wc -l. It is needed here because without it, you would not be able to catch just the numerical output (number of lines). To see this, try deleting < from the script and see what the output looks like (it will also print the script name, which you do not want).

Concatenate the contents of two files

Save this as ConcatenateTwoFiles.sh:

#!/bin/bash

cat $1 > $3
cat $2 >> $3
echo "Merged File is"
cat $3

Convert tiff to png

This assumes you have done apt install imagemagick (remember sudo!)

Save this as tiff2png.sh:

#!/bin/bash

for f in *.tif; 
    do  
        echo "Converting $f"; 
        convert "$f"  "$(basename "$f" .tif).png"; 
    done

:::{figure-md} XKCD-shell-script XKCD

This is not a good use of shell scripting!
(Source: XKCD)

:::


Practicals

Instructions

  • Along with the completeness of the practicals/exercises themselves, you will be marked on the basis of how complete and well-organized your directory structure and content is.

  • Review (especially if you got lost along the way) and make sure all the shell scripts you created in this chapter are functional.

  • Make sure you have your weekly directory organized with data, sandbox, code with the necessary files, under CMEECourseWork/week1.

  • All scripts should run on any other Unix/Linux machine — for example, always call data from the data directory using relative paths.

  • Make sure there is a readme file in every week's directory. This file should give an overview of the weekly directory contents, listing all the scripts and what they do. This is different from the readme for your overall git repository, of which Week 1 is a part. You will write a similar readme for each subsequent weekly submission.

  • Don't put any scripts that are part of the submission in your home/bin directory! You can put a copy there, but a working version should be in your repository.

Improving scripts

Note that some of the shell scripts that you have created in this chapter above requires input files. For example, tabtocsv.sh needs one input file, and ConcatenateTwoFiles.sh needs two. When you run any of these scripts without inputs (e.g., just bash tabtocsv.sh), you either get no result, or an error.

  • The goal of this exercise is to make each such script robust so that it gives feedback to the user and exits if the right inputs are not provided.

A new shell script

  • Write a csvtospace.sh shell script that takes a comma separated values and converts it to a space separated values file. However, it must not change the input file — it should save it as a differently named file.

  • This script should be able to handle wrong or missing inputs (similar to the previous exercise).

  • Save the script in CMEECourseWork/week1/code, and run it on the csv data files that are in Temperatures in the master repository's Data directory.

{hint}
In these shell scripting practicals, to strip out and/or change file extensions, you may need to use parameter expansions, and specifically, parameter substitutions, along with pattern matching. Read about these and try out the examples [here](https://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html) and [here](https://www.gnu.org/software/bash/manual/html_node/Pattern-Matching.html).

Readings & Resources