Instead of typing all the UNIX commands we need to perform one after the other, we can save them all in a file (a "script") and execute them all at once. Recall from the UNIX and Linux Chapter that the bash shell (or terminal) is a text command processor that interfaces with the Operating System. The bash shell provides a computer language that can be used to build scripts (AKA shell scripts) that can be run through the terminal.
It is possible to write reasonably sophisticated programs with shell scripting, but the bash language is not featured to the extent that it can replace a "proper" language like C, Python, or R. However, you will find that shell scripting is necessary. This is because as such, as you saw in the previous chapter, UNIX has an incredibly powerful set of tools that can be used through the bash terminal. Shell scripts can allow you to automate the usage of these commands and create your own, simple utility tools/scripts/programs for tasks such as backups, converting file formats, handling & manipulating files and directories). This enables you to perform many everyday tasks on your computer without having to invoke another language that might require installation or updating.
Let's write our first shell script.
=
when assigning these variables; : MY_VAR=value
would work, but MY_VAR = value
wouldn't, because then the shell assumes that MY_VAR
must be the name of a command and tries to execute it (with = value
as arguments).$\star$ Write and save a file called boilerplate.sh
in CMEECourseWork/week1/code
, and add the following script to it
(type it in your code editor):
#!/bin/sh
# Author: Your Name your.login@imperial.ac.uk
# Script: boilerplate.sh
# Desc: simple boilerplate for shell scripts
# Arguments: none
# Date: Oct 2019
echo -e "\nThis is a shell script! \n"
#exit
The .sh
extension is not necessary, but useful for you and your programming IDE (e.g., Visual Studio Code, Emacs, etc) to identifying the file type.
#!/bin/bash
(assuming you are using the bash shell). It tells the bash interpreter that this is a bash script and that it should be interpreted and run as such, and be executed by /bin/sh
.-e
flag to echo
exit
command at the end of the script. Uncommenting it will not change the behavior of the script, but will allow you to generate a error code, and if the command is inserted in the middle of the script, to stop the code at that point. To find out more, see this and this in particular. {tip}
`#!/bin/sh` is the standard location of the Bourne shell (`sh`) on most Unix systems. If you're using GNU/Linux (e.g., Ubuntu), `/bin/sh` is normally a symbolic link to bash (or, sometimes, `[dash]`](https://blog.cloudware.bg/en/dash-vs-bash-shell/)).
In shell scripts, there are certain, "special" characters that must be properly "escaped" to avoid interpretation by the shell. Some of these you already saw in the UNIX Chapter; for example, in the bash challenge command find . -type f -exec ls -s {} \; | sort -n | head -10
, the character ;
had to be escaped with a \
to avoid being interpreted as a special character. There is a list of these in the UNIX chapter, and additional ones will be introduced here.
Next, let's run your first shell script.
There are two ways to run a shell script:
bash myscript.sh
(You can also use sh myscript.sh
, but it may give a slightly different output.)
This is the right way if the script is does something specific in a given project.
{note}
Bash (bash) is one of many available (yet the most commonly used) Unix shells. Bash stands for "`B`ourne `A`gain SHell",and is an improvement of the original Bourne shell (`sh`). Basically `bash` is `sh`, with more features and nicer (more intuitive, compact) syntax. Most inbuilt UNIX commands or your own scripts will work the same, but at times with subtle differences in output.
{tip}
**Mac Users**: your default shell might not be `bash`, `zsh`. Usually, running a shell script or command with `bash` and `zsh` will give you an identical processing and output. The commands you learned for `bash` will also work in `zsh` although they may give somewhat different output.
chmod +x myscript.sh
./myscript.sh # the ./ is needed
Use this second approach for a script that does something generic, and is likely to be reused again and again (Can you think of examples?)
The generic scripts of type (2) can be saved in username/bin/
, and made easily accessible by telling UNIX to look in /home/bin
for specific scripts. To this end, you need to add bin
to the directory paths that linux searches in for executables. For this you need to set the $PATH
environmental variable: a list of directories (separated by colons) that tells the shell which ones to search for executable files (more on environmental variables below).
First, check which directories are already in $PATH
:
echo $PATH
Then check if you already have a bin directory
:
find /home/ -maxdepth 3 -name 'bin' -type d
{tip}
**Mac Users**: on Macs you may not need to search `/home/`, but just `/`
Note the maxdepth 3
directive. You don't want to search in every possible directory in your UNIX tree (under home
)! If you see no bin
directory (e.g., you might find .local/bin
), then create one:
mkdir ~/.local/bin # in ".local" to keep it to only current user
Then, add it to the $PATH
:
export "PATH=$PATH:$HOME/.local/bin"
This change will not persist after you have rebooted your computer. To make it persistent,
export PATH=$PATH:$HOME/.local/bin
, to the appropriate file that will be read when your shell launches. There are a few different files where you can set the variable name: ~/.bashrc
~/.profile
~/.bash_profile
Check if these files exist, and then add the path specification command (export PATH=$PATH:$HOME/.local/bin
in this case) to any of them, but usually ~/.bashrc
is a good choice. Then log out and in again, or run source ~/.bashrc
(if it was indeed .bashrc
that you edited).
zsh
shell, it will be ~/.zshrc
.{note}
If you have two executable files sharing the same name located in two different directories, the shell will run the file that is in the directory that comes first in the paths listed in `$PATH`.
Now run your first shell script.
★ cd
to your code
directory, and run it :
cd ../code
bash boilerplate.sh
I have specified the relative path ../code
assuming that you are in some other directory in your current week (sandbox
, results
or data
).
(02-ShellScripting-Variables)=
You will need to handle and manipulate variables (AKA parameters) inside shell scripts to truly exploit the powerful features of the bash (shell) language.
{note}
At the most fundamental level, a "variable" in any programming language or environment is a *named* section (portion, chunk) of the computer's memory which can be assigned values, read and manipulated.
Shell scripts have two types of variables.
These are set by the shell, and typically cannot have values assigned to them (cannot be modified). They contain useful or necessary information needed for the script to run. These include:
$PATH
, which you saw above), are available system-wide (so you can invoke them directly in the commandline, outside a shell script), and are available to (or "inherited by") all new processes and shells generated ("spawned") by a bash script (AKA a "child" process or shell).{tip}
To see a list of all current environmental variables, you can use `env` in the commandline.
Here are some key special internal variables in shell scripts :
Variable | Description |
---|---|
$0 |
The filename (basename) of the current script, including any extension |
$n ($1...$9 ) |
Here n is an integer corresponding to the position of an argument (the first argument is $1 , the second is $2 , etc). |
$# |
The number of arguments (parameters) supplied to a script (the script was "called" with) |
$@ |
All the arguments are individually printed. For example, if a script receives two arguments, $@ is equivalent to $1 $2 |
This is not an exhaustive list, but the important ones to remember in basic shell scripting.
These are assigned manually by the user. These are present within the current instance of the shell only and are not available any child processes spawned started by the script unless they are explicitly exported.
In general, assigned variables in the bash language are analogous to those in any other programming language (e.g., Python): they can be a number, a character, a string of characters, or boolean (true/false). There are three ways to assign values to such variables (note lack of spaces!):
MY_VAR=myvalue
read MY_VAR
MY_VAR=$(command)
(the variable is the output of some command
); e.g., MY_VAR=$( (ls | wc -l) )
{tip}
The command substitution of the type MY_VAR=$(command) is one type of "shell expansion". There are several types of shell expansions, which you can learn about [here](https://www.gnu.org/software/bash/manual/html_node/Shell-Expansions.html). Along with command substitution, [shell parameter expansion](https://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html) is particularly important to learn about in shell scripting.
Here is an example illustrating the different types of shell variables (and assignments):
#!/bin/sh
## Illustrates the use of variables
# Special variables
echo "This script was called with $# parameters"
echo "The script's name is $0"
echo "The arguments are $@"
echo "The first argument is $1"
echo "The second argument is $2"
# Assigned Variables; Explicit declaration:
MY_VAR='some string'
echo 'the current value of the variable is:' $MY_VAR
echo
echo 'Please enter a new string'
read MY_VAR
echo
echo 'the current value of the variable is:' $MY_VAR
echo
## Assigned Variables; Reading (multiple values) from user input:
echo 'Enter two numbers separated by space(s)'
read a b
echo
echo 'you entered' $a 'and' $b '; Their sum is:'
## Assigned Variables; Command substitution
MY_SUM=$(expr $a + $b)
echo $MY_SUM
$\star$ Save this as a single variables.sh
script.
$\star$ Now run this script with any arguments:
bash variables.sh
$\star$ And compare the output when run with two arguments:
bash variables.sh 1 two
And also type into another script file the following (save as MyExampleScript.sh
) and run it:
#!/bin/sh
MSG1="Hello"
MSG2=$USER
echo "$MSG1 $MSG2"
echo "Hello $USER"
echo
This introduces you to the $USER
(same as $USERNAME
) environmental variable.
Let's write a shell script to transform comma-separated files (csv) to tab-separated files and vice-versa. This can be handy — for example, in certain computer languages, it is much easier to read tab or space
separated files than csv (e.g., C
)
To do this, in the bash we can use tr
(abbreviation of tr
anslate or tr
ansliterate), which deletes or substitute characters. Here are some examples.
echo "Remove excess spaces." | tr -s " "
echo "remove all the a's" | tr -d "a"
echo "set to uppercase" | tr [:lower:] [:upper:]
echo "10.00 only numbers 1.33" | tr -d [:alpha:] | tr -s " " ","
Now write a shell script to substitute all tabs with commas called tabtocsv.sh
:
#!/bin/sh
# Author: Your name you.login@imperial.ac.uk
# Script: tabtocsv.sh
# Description: substitute the tabs in the files with commas
#
# Saves the output into a .csv file
# Arguments: 1 -> tab delimited file
# Date: Oct 2019
echo "Creating a comma delimited version of $1 ..."
cat $1 | tr -s "\t" "," >> $1.csv
echo "Done!"
exit
Now test it (note where the output file gets saved and why). First create a text file with tab-separated text:
echo -e "test \t\t test" >> ../sandbox/test.txt # again, note the relative path!
Now run your script on it
bash tabtocsv.sh ../sandbox/test.txt
Note that
$1
is the way a shell script defines a placeholder for a variable (in this case the filename). See next section for more on variable names in shell scripts.
The new file gets saved in the same location as the original (Why is that?)
The file got saved with a .txt.csv
extension. That's not very nice. Later you will get an opportunity to fix this!
Here are a few more illustrative examples (test each one out, save in week1/code/
with the given name):
Save this as CountLines.sh
:
#!/bin/bash
NumLines=`wc -l < $1`
echo "The file $1 has $NumLines lines"
echo
The <
redirects the contents of the file to the stdin (standard input) of the command wc -l
. It is needed here because without it, you would not be able to catch just the numerical output (number of lines). To see this, try deleting <
from the script and see what the output looks like (it will also print the script name, which you do not want).
Save this as ConcatenateTwoFiles.sh
:
#!/bin/bash
cat $1 > $3
cat $2 >> $3
echo "Merged File is"
cat $3
This assumes you have done apt install imagemagick
(remember sudo
!)
Save this as tiff2png.sh
:
#!/bin/bash
for f in *.tif;
do
echo "Converting $f";
convert "$f" "$(basename "$f" .tif).png";
done
Along with the completeness of the practicals/exercises themselves, you will be marked on the basis of how complete and well-organized your directory structure and content is.
Review (especially if you got lost along the way) and make sure all the shell scripts you created in this chapter are functional.
Make sure you have your weekly directory organized with data
, sandbox
, code
with the necessary files, under CMEECourseWork/week1
.
All scripts should run on any other Unix/Linux machine — for example, always call data from the data
directory using relative paths.
Make sure there is a readme
file in every week's directory. This file should give an overview of the weekly directory contents, listing all the scripts and what they do. This is different from the readme
for your overall git repository, of which Week 1
is a part. You will write a similar readme
for each subsequent weekly submission.
Don't put any scripts that are part of the submission in your home/bin
directory! You can put a copy there, but a working version should be in your repository.
Note that some of the shell scripts that you have created in this chapter above requires input files. For example, tabtocsv.sh
needs one input file, and ConcatenateTwoFiles.sh
needs two. When you run any of these scripts without inputs (e.g., just bash tabtocsv.sh
), you either get no result, or an error.
Write a csvtospace.sh
shell script that takes a c
omma s
eparated v
alues and converts it to a space separated values file. However, it must not change the input file — it should save it as a differently named file.
This script should be able to handle wrong or missing inputs (similar to the previous exercise).
Save the script in CMEECourseWork/week1/code
, and run it on the csv
data files that are in Temperatures
in the master repository's Data
directory.
{hint}
In these shell scripting practicals, to strip out and/or change file extensions, you may need to use parameter expansions, and specifically, parameter substitutions, along with pattern matching. Read about these and try out the examples [here](https://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html) and [here](https://www.gnu.org/software/bash/manual/html_node/Pattern-Matching.html).
The bash reference manual: https://www.gnu.org/software/bash/manual/bash.html
Plenty of shell scripting resources and tutorials out there; in particular, look up http://www.tutorialspoint.com/unix/unix-using-variables.htm
These is a relatively intuitive set of notes on shell scripting; https://www.shellscript.sh/
Some shell scripting examples