What is a shell?
User interfaces are the abstraction through which human beings interact with computers. We can partition the majority of user interfaces into two types: the graphical user interface, which is familiar to most people due to their ubiquity in desktop operating systems (icons, windows, mouse, etc.), and the character-based user interface' where input and output are both purely text-based (characters in a terminal, keyboard). The command-line available under most operating systems is a character-based input. Rather than pointing at a visual representation of a program and clicking it in order to execute it, we run a program by typing its name and providing any additional arguments that it may require in order to specify the desired behaviour. Note that these can both be seen as different abstractions of the same underlying behaviour: telling the operating system to execute a program.
A shell is a character-based user interface that interprets the text typed in by the user and translating them into instructions to the operating system; similarly, any response from the operating system or the program being executed is provided to the shell which displays it in plain text. Anyone who has already used a SHARCNET system is familiar with the shell whether they realize it or not: the command-line at which you enter commands to the system is provided by the shell (which for most users is bash by default).
Although we tend to see the shell purely as a user interface, it is also possible to use it as a programming environment. A text file which has been written appropriate to execute shell commands is called a shell script, and can be used to automate the execution of commands and taking appropriate action based on the outcome. Writing bash shell scripts is the focus of this tutorial.
A Brief History of the Major UNIX Shells
1979: Bourne shell (sh)
- the first UNIX shell
- still widely used as the "lowest common denominator" for shell scripts: all UNIX systems support Bourne shell syntax
1981: C shell (csh - evolved into tcsh)
- originally part of BDS UNIX
- modified and expanded Bourne shell syntax to more closely resemble the C programming language
- introduced aliases and job control
1988: Bourne again shell (bash)
- developed as part of the GNU project (default shell in Linux)
- incorporated much from previous shells (csh, ksh, etc.)
- introduced command-line editing, functions, integer arithmetic
As is common in computing, nothing ever truly dies. As time goes on, the various shells cannibalize features from one another and on some level any modern, maintained shell provides most of the functionality of any of the others. We choose to focus on bash in this tutorial as it is by far the most widespread shell in use today thanks to it being the default under Linux. It is well maintained, and can be expected to remain stable into the future.
This tutorial is concerned primarily with bash shell scripting. Given that the entire point to doing this is to automate commands that you would otherwise enter at the command line, there are a wealth of tools provided on the system that we can leverage in our scripts. It is beyond the scope of this tutorial to cover any quantity of system utilities in depth; however, we will point out some of the more generally useful tools to provide some raw material for your scripts, and to serve as a foundation with which to consider the use of other tools.
You may already be familiar with some of these concepts: IO redirection, wildcard expansion for filenames, pipelining, etc. Note that anything you can type at a bash command line can be used in your scripts so as you become familiar with more syntax and system tools, the potential sophistication of your scripts improves as a matter of course. To this end, it is helpful to note that there is "built-in" help available for the bash shell by typing help at the command line. You can obtain more detailed information on specific topics by typing help topic.
We'll first review a few key system tools that you'll find useful, and review basic bash syntax that will be helpful in leveraging these tools in combination with one another.
Anything that is usable on the system can be used in a script. What follows are some commonly used utilities that are invaluable in shell scripts (depending what you're trying to do). We only discuss these tools briefly here; additional detail can be obtained from the man pages for the tools in question.
|echo||output text to stdout|
| echo "Hello world"
echo -n "Hello world"
|cat||copy (concatentate) input to output|
| cat somefile.txt
|cut||select columns from text|
| cut -f 2 -d ' ' file.txt
|sed||stream editor: performs edits on a stream of input text, emitting the modified text as output|
| sed -e 's/\ \ */\ /g' file.txt
|mv, cp, mkdir, ls, file, etc.||file management|
|All commands that you are familiar with for basic file operations can be useful in scripts:|
This is far from a comprehensive list of tools. Consider it a useful starting point in order to understand that all these text tools (most of which you still may not know exist) are the root of most text processing that you can do in shell scripts.
Any time we run a program, there is always the notion of standard input and standard output (there is another stream called standard error (stderr), however by default it goes to the same location as standard output; be aware it is a separate stream of characters and can be handled separately from standard output). When a program reads from the default character source, it is reading from its standard input (stdin), which for an interactive program is typically the keyboard. When it writes to the default character output, it is writing to its standard output (stdout), which is typically the terminal (screen). Being able to manipulate these as we construct commands is extremely helpful as most basic system tools already expect to consume input from stdin, and write output to stdout. By exercising control over how these fit together, we can chain one tool onto another so that the output of the first program becomes the input to the second program and so on. This permits very complex pipelines for manipulating text.
Let's review some basic I/O redirection operations (note these work on the command-line as well - as your command line is provided by a bash shell!):
./myprog arg1 arg2 > output.txt
- Output redirection: redirect stdout of a program to a file (overwrites file if it already exists):
- Anything the program sends to stdout will appear in the specified file rather than on screen
./myprog arg1 arg2 < input.txt
- Input redirection: provide the contents of a file as the stdin of a program
- Note: by doing this, anything your program attempts to read from stdin will be consumed from the provided file (this can be handy for automating testing, or using files to provide parameters that would otherwise be typed in at a prompt provided by the program)
./myprog arg1 arg2 < input.txt > output.txt
- Redirect both input and output:
./myprog arg1 arg2 >> output.txt
- Redirect output to a file, appending to the existing file if it already exists:
Note that if you wish to redirect stdout separately from stderr, you can use the file descriptor numbers with the redirection operator (1 == stdout, 2 == stderr).
./myprog arg1 arg2 1> stdout.txt 2> stderr.txt
The following syntax redirects both stdout and stderr to the same file:
./myprog arg1 arg2 &> output.txt
System calls exist to allow a programmer to directly connect the stdout of one process to the stdin of another. The programming construct used to do this is called a "pipe". bash provides a means of performing this function directly on the command line, which we refer to as "piping" the output of the first program to the input of the second. This permits us to stage operations on text by further processing a character stream while it is being output.
The following demonstrates the use of a pipe by running the uptime utility, which provides basic information regarding the system and the current computational load --- if we were interested in recording the time and the load average for the past 15min, we can pipe the output from uptime into the cut utility. There are already spaces separating the fields of the output from uptime, so some quick counting (or trial and error) will lead us to wanting to select the 2nd and 15th field of uptime's output, delimiting fields by spaces.
Note that the "net" output of the pipelined command is simply the fields desired. The output from uptime was passed as the input to cut; only the resulting output from cut appears as output from the entire command line.