Graduate Program KB

Bash

Video

  • Macro processor that executes commands
    • Macro: text and symbols expanded to create larger expressions
  • What exactly this means should be clearer through this talk

Useful resources

  • POSIX standard

  • Linux Foundation Referenced Specifications

  • Freedesktop specifications

  • IETF RFCs

  • Bash Hackers Wiki

  • The Portable Operating System Interface (POSIX) standards define an environment for a user to interact with an operating system. Including a C API, shell, and utility programs.

  • POSIX was developed around the same time as the C standard library and the POSIX API is a superset of it.

  • The linux standard base (LSB) is an extension of POSIX created to reduce the differences between distributions, which includes:

    • Required users and groups
    • How the system should be initialized
    • Additional utilities and libraries that should be installed
    • Etc...
  • The Freedesktop specifications are focused on compatibility between different desktop applications and describe, for example.

    • How to drag and drop files between applications
    • How applications can find standard icons for buttons
    • How a desktop environment can find a list of applications, icons and descriptions that it can launch

Simple command

echo a b c
  • Command name followed by arguments passed to the command as a list
  • Some common commands
    • ls list files
    • cd change directory
    • rm remove files/directories
    • mv move files/directories
    • cat concatenate files

Variables

A variable definition looks like

myvar="some string"
  • There should be no spaces around the =
  • Some useful builtin variables
    • HOME Home directory
    • PATH Where to look for executables
    • LINENO Line number being run in script - useful for error reporting
    • PWD Current directory
    • OLDPWD Last directory
    • RANDOM Random values

Conditionals and Control Operators

  • Unlike most languages bash treats zero as true and non-zero as false
  • The reason for this is that programs return an non-zero exit status on failure which represents an error
  • The exit status of a command is set in the ? variable
  • true and false are builtins (as well as installed on your system) which set the exit status
    • True will set the exit status to 0
    • False will set it to a non-zero value not specified by the POSIX standard or the bash manual
  • The control operators && and || act like in other languages with short circuit evaluation
    command1 && command2
    
    • Will only run command2 if command1 succeeds
    command1 || command2
    
    • Will only run command2 if command1 fails
    • Note that these operators have the same precedence in bash though, so commands will be executed left to right
  • The control operator ; will run one command and then the next
    command1; command2
    
    • Will run both commands
  • A list of commands is a series of commands separated by control operators

Compound commands

  • Compound commands begin and end with reserved words
  • Bash has looping, conditional and grouping compound commands
  • How these are actually read and pitfalls with writing them will be discussed in the next section

If

if test-commands; then
  consequent-commands;
elif more-test-commands; then
  more-consequents;
else alternate-consequents;
fi

While

while test-commands; do consequent-commands; done

For

for name in words... ; do commands; done

Grouping commands

  • A grouping command tells bash to run the commands as a unit, redirections may be applied on the entire group
  • There are 2 grouping commands
    { command-list; }
    
    ( command-list )
    
  • Parentheses cause bash to execute the command list in a subshell (discussed later), while curly braces do not
  • Also note that while parentheses are metacharacters, curly braces are not (discussed later)

Functions and parameters

A function declarations looks like

function name() compound-command
  • You can leave off the word function or the parentheses, but not both
  • Commonly the compound command is a grouping command with curly braces
  • Parameters are not named and are instead given numbers based on their position in the command line
  • Note: The 0th parameter is the 0th word used to execute the command e.g. the executable/script name
    echoparam() {
      echo $1
    }
    echoparam hi
    
  • Variable declarations in functions are global unless proceeded with the local builtin
  • Functions return error codes which have to be integers and output values through standard output
  • If return isn't specified the function will return the exit status of the last command

REPL

Typically an interactive language session is called a REPL which stands for

  • Read, eval, print, loop
    • Read a command
    • Run a command
    • Print output
    • Repeat
  • In most languages, like js or python, generally a line is read and evaluated to some data type
  • That data is then converted to a string representation and printed to the console
  • Bash works a bit differently and the loop looks more like
    • Read a command
    • Setup stream redirection
    • Run command
    • Repeat
  • One of the main differences here is that bash does not run functions which return data
  • Bash executes other programs; connecting them to each other, and to files (or other streams)
  • The other main difference is that a lot of operations done by bash itself are while reading the command and expanding macros

Reading

  • When bash reads a line it first splits the line in words at spaces and metacharacters

    • Metacharacters: | & ; ( ) < >
  • Splitting follows quoting rules

  • Quoting is a mechanism to disable the special meaning of a characters and reserved words

  • Bash has 3 types of quoting

    1. Escape character
    2. Single quotes
    3. Double quotes
  • Escape character

    • The escape character is a non quoted backslash
    • Removes the special meaning of the next character
    • The manual states that newlines are an exception, but you can see it as the same effect
    • Normal special meaning of a new line is to end the current command, that meaning is disabled by the backslash so the command continues on the next line
      touch a\ b
      
    • The space's meaning of ending a word is disabled so touch is called with a single word argument 'a b' and makes a file of that name (with a space in it)
  • Single quotes

    • Removes the special meaning of everything inside it
      touch 'a b'
      
    • Including backslashes, so there is no way of including a single quote inside single quotes
    • i.e.
      touch 'a\'b'
      
      • Backslash does not escape the single quote, so we have a literal a\b followed by an unclosed quote
      • Note that if you need a single quote you can do something like this
        touch 'a '\'' b'
        
      • Will create a file called a ' b
      • Quotes are not metacharacters and do not end words, so that is all 1 argument to the command
  • Double quotes

    • Remove special meaning of all characters except: $, `, !, and \
    • This means that all the expansions except brace expansion, tilde expansion, filename expansion, and alias expansion will occur in double quotes
    • Double quotes will perform parameter and variable expansion, arithmetic expansion, command substitution, and history expansion
    • Also note that * and @ will be allowed to have their special meaning if used in an parameter expansion
  • After splitting into words bash loops over the words performing macro expansion

  • Expansions performed in order

    1. Brace expansion
    2. Tilde expansion
    3. Parameter and variable expansion
    4. Arithmetic expansion
    5. Command substitution
    6. Word splitting
    7. Filename expansion
  • Only brace expansion, word splitting and filename expansion can increase the number of words

    • Note: Even though variable expansion results in a single word it may be split during word splitting
  • The bash manual does not list aliases in the expansion section, but you can think of them as being done first since all the other expansions will be performed on the result

Exercises

The following function definition gives an error when run in bash

myfun(){echo test}

What's the minimum number of changes to fix the problem?

Answer
  • Bash only splits into words at spaces and metacharacters so, using commas to delimit words, this line is read as
    myfun,(,),{echo,test}
    
  • Curly braces are not metacharacters
  • So the command name bash is reading is {echo instead of echo
  • We also have the same problem with printing test
    myfun(){ echo test }
    
  • But this still doesn't work
  • Because the end brace is not a metacharacter it also does not end the command
  • So this is interpreted as the start of a function to run echo with 2 arguments, test, and {
  • We can end the command with a semicolon
  • This also removed the need for the space to separate test from the closing brace
    myfun(){ echo test;}
    

The following code results in bash expecting more input instead of running the command. What are the problems with it?

a=2
if [$a==1] then echo "true" fi
Answer
  • The first problem is that there is nothing to end the commands in the condition or the body, similar to the last question
  • There also isn't anything special about the square bracket characters
  • Bash is interpreting the condition for the if as a command called [$a==1] with the arguments then, echo, "true", fi
  • We can fix this with a semicolon, but the echo has the same issue, needs to be ended before the fi
  • So we need 2 semicolons
    if [$a==1]; then echo "true";fi
    
  • Now we get an error bash: [2==1]: command not found
  • Since none of those are metacharacters it's read as one word for the command name
  • Need to add spaces
    if [ $a==1 ]; then echo "true";
    
  • Note that the only thing special about the end bracket is that the test command will complain if it's not passed as the last argument
  • The next problem is it always evaluates to true
  • This is because we are giving the test command a non empty string "2==1" which is truthy
  • These need to be separate arguments to the command
    if [ $a == 1 ];then echo "true";fi
    

Alias expansion

  • An alias can be created with the alias builtin, like this
    alias myalias='echo 123'
    
  • If the first word of a command is unquoted bash will check if it matches and alias and if so replace it with the string assigned to the alias
  • Bash will recursively expand aliases, but will stop at cycles
  • This is so you can make an alias like
    alias ls='ls --color=auto'
    
  • The ls expanded from the ls alias will not be expanded again, later when executing the command bash will find the ls program and run it
  • Aliases will not work when the shell is non-interactive, i.e. in scripts

Brace expansion

  • A brace expansion consists of
    1. A preamble before the opening brace
    2. The brace expression
    3. A postscript after the closing brace
  • Bash will expand the word to multiple words, with the preamble and postscript around every value from the brace expression
  • You can use ranges with ..
    • Format: start..stop..step
    • The range is inclusive
    > echo a{d,c,b}e
    ade ace abe
    > echo a{2..8..3}e
    a2e a5e a8e
    

Tilde expansion

  • If the word begins with an unquoted ~, characters from (including) the ~ up to the first / (excluding) are a tilde prefix
    • ~name
      • name's home directory
    • ~
      • Shorthand for the current user's home directory
    • ~+
      • $PWD
    • ~-
      • $OLDPWD
  • There are some other prefixes you can read in the manual
  • If expansion fails it is left unchanged, i.e.
    echo ~foo/
    
    Outputs
    ~foo/
    

Parameter and variable expansion

  • Basic format ${name}
  • You can remove the braces in most situations $name, except
    1. Positional parameters with more than one digit
    2. There's a character following the name which isn't part of it e.g.
    myvar=1
    echo $myvarAAA
    
    Will output null since the variable name is read as myvarAAA
  • There are more complex forms which perform operations on the parameter, e.g.
    • Substrings
    • Length
    • Pattern matching
    • Array indexing
  • An example
    • To split a file name at the first dot
      myfile='file.txt'
      echo ${myfile%%.*}
      file
      echo ${myfile#*.}
      txt
      

Command substitution

  • Command substitution runs a command in a subshell and replaces the expression with what the command writes to standard out
  • There are 2 forms
  • $(command), or
  • `command`
  • They can both be nested, but the backquotes require inner backquotes to be escaped

Arithmetic expansion

  • $((1+1))
  • All tokens undergo parameter and variable expansion, command substitution, and quote removal
  • Arithmetic expression is evaluated and the expansion is replaced by the result
  • May be nested

Word splitting

  • Bash looks at the results of parameter expansion, command substitution and arithmetic expansion that did not occur in double quotes
  • Splits into words based on the value of $IFS
  • Default ifs is <space><tab><newline>
  • We can write a function to more easily see what its parameters were by joining them with commas
    join() {
      local IFS=','
      echo "$*"
    }
    
  • $* expands to all the parameters given to the function (starting at 1)
  • When in quotes bash will join them together with the value of IFS
  • Due to word splitting when using parameter expansion, command substitution, or arithmetic expansion without quotes the output will be split at spaces (by default) and passed as separate parameters
    join $(echo a b c)
    a,b,c
    myvar="a b"
    join $myvar
    a,b
    
  • Double quotes stop this behaviour and the result is given to join as a string with spaces
    join "$(echo a b c)"
    a b c
    myvar="a b"
    join "$myvar"
    a b
    

Filename expansion

  • Also called globbing
  • Bash checks each word for *, ?, and [
  • If the character is unquoted then it is a pattern
  • The expression will be replaced by files which match the pattern
  • Pattern
    • * Matches any string
    • ? Matches any character
    • [...] Matches any characters in the brackets
    • [b-e] matches any character in the range
    • [:class:] Matches characters in the class
      • Examples of classes: alnum, alpha, ascii, blank, digit, lower, upper, space
  • By default names starting with a . must be matched by a pattern starting with a .

Quote removal

  • All unquoted , ', and " which were not the result of an expansion are removed
  • This means that the quotes themselves will not be passed as an argument unless they came from somewhere
  • i.e.
echo "a"'b'"c"
  • Results in
abc

Not

"a"'b'"c"

Exercises

What will the following print

a=1; echo $a
Answer
1 

Variable is expanded

a=1; echo ${a}string
Answer
1string
a=1; echo '${a}string'
Answer
${a}string

The single quotes stop the expansion

a=1; echo "${a}string"
Answer
1string

Variable is expanded within double quotes

a=1; alias test="echo $a"
a=2; test
Answer
1
  • $a is expanded in the string assigned to the alias
  • test expands to echo 1 which outputs 1
a=1; alias test='echo $a' 
a=2; test 
Answer
2
  • $a is not expanded in the string assigned to the alias
  • test expands to echo $a
  • Which then expands to echo 2
alias a="echo 123"
b=a
$b
Answer
bash: a: command not found

Alias expansion was performed before the parameter substitution, bash does not check for aliases after $b was expanded

alias myalias="echo 123"; myalias
Answer
bash: myalias: command not found
  • Bash processes one line at a time, the my alias following the ; was expanded when the line was read
  • But the alias won't be defined until the line is executed and the alias builtin is run
a=1
echo $(echo '$a')
Answer
$a

Bash performs variable expansion before command substitution, and will not rerun the variable expansion after expanding the inner echo

echo file{a, b, c}
Answer
file{a, b, c}
  • You can't use spaces in a brace expansion
  • Interestingly though, you can in an arithmetic expansion
echo "file{a,b,c}"
Answer
file{a,b,c}
  • Brace expansion is not performed in double quotes

What is wrong with the following code?

if [ 1 <  2 ]; then
  echo "true";
fi
Answer
  • The unquoted < is interpreted as a redirection and needs to be quoted
if [ 1 \< 2 ]; then
echo "true"
fi
a=""
if [ $a == 1 ]; then
  echo "true"
else
  echo "false"
fi
Answer
[ $a == 1 ]

Is expanded to

[ == 1 ]
  • Which is an invalid expression
  • Need to use quotes
[ "$a" == 1 ]

Processes

  • A process can be thought of an instantiation of an executable, similar to how an object is an instantiation of a class
  • A class can be instantiated multiple times into a new object each time, each with it's own data
  • Similarly an executable can be run multiple times to create new processes, each given their own resources by the OS
  • Each process is given a process identifier (pid) by the os to uniquely identify it
  • Resources the process can request from the OS include memory, and files
  • A process makes requests to the os by performing system calls (syscalls)
  • Some system calls include
    • open - open a file
    • read - read from a file
    • write - write to a file
  • A process can tell the os to fork it by invoking the fork syscall
  • This tells the OS to make a new process which is a copy of the old one with all the same resources
  • The process that invoked the fork is the parent process, and the resulting process is the child
  • Even though they are copies they can tell themselves apart (and change their behaviour) because fork will return 0 in the child, but the child's pid in the parent
  • The exec family of syscalls tell the OS to replace the calling process's image with another executable file with the option to specify new parameters and environment variables
  • Forking and having the child call exec to run another program is a common pattern and how bash executes (non function) commands

Streams and files

  • A stream is a construct which represents a flow of data, for example to/from a file or a network socket
  • Streams can be unidirectional (read or write) or bidirectional (both)
  • A file descriptor (fd) is a number given to a process by the os that refers to a stream
  • The POSIX standard specifies that three streams should be made available when a program starts
    • stdin: standard input (fd = 0)
      • For reading input from the user's console (or another program)
    • stdout: standard output (fd = 1)
      • For writing output to the user's console (or another program)
    • stderr: standard error (fd = 2)
      • For writing errors
      • This can be redirected to a log file e.g. in a service started with systemd
  • When calling fork the file descriptors of the parent process are retained in the child, they are also not overwritten by exec
  • A terminal emulator is a program which drives a shell, passes input to the shell and draws output
    • Also accepts control sequences to change how it operates (move cursor, colour, etc...)
  • The terminal emulator creates the standard streams for the shell and then executes it with fork + exec
  • It is then responsible for getting your input from the keyboard and passing it to the shell's stdin as well as drawing the shell's stdout on the screen
  • When bash runs a command using fork + exec those streams are retained in the executed command unless they were redirected (next section)
  • Programs can detect whether they are connected to a terminal and adjust the output
    • As an example, ls will arrange the output into columns and possibly colour the output when connected to a terminal
    • When not connected to a terminal (e.g. a file, or piped to another command) by default all the filenames will be separated by newlines and there will be no colour
  • Not all programs will do this
    • May have to specify parameters to format the output one way or another
    • Check the manual for the specific program

Redirection

  • After forking, but before replacing the process's code with the command program with exec bash can change what the file descriptors point to
  • This is called redirection
  • The format for a standard redirection is
  • <file descriptor><direction><word>
  • Directions
    • < Input
    • > Output
    • >> Append
    • <> Input and output
  • The file descriptor is what file descriptor in the process we are changing
  • The final word is what we are setting it to
  • Shorthand
    • You can leave off the file descriptor
      • For input it defaults to stdin
      • For output it defaults to stdout
    • You can use & instead of the first number to refer to stdout and stderr
      • i.e. &> file.txt redirects both to a file
    • >& file.txt also redirects both to a file but is weird and not recommended
  • Also note that redirections are applied in order

Input

Redirecting a file to standard input

cat < file.txt

Shorthand for

cat 0< file.txt

Output

Redirecting standard output to a file

echo somestring > file.txt

Shorthand for

echo somestring 1> file.txt

Appending to a file

echo somestring >> file.txt

Shorthand for

echo somestring 1>> file.txt

Redirecting standard error to a file

echo somestring 2> file.txt

Redirecting stdout and stderr

echo someseting &> file.txt

Duplicating

  • The form
fd>&fd
  • Will set the second file descriptor to refer to the same stream as the first file descriptor
  • For example
echo abc 2>&1
  • Will set stderr to use the same stream as stdout

Exercise

What's the difference between

ls > files.txt 2>&1

And

ls 2>&1 > files.txt
Answer
  • Redirections are performed in order
  • In the first command
    • Start with file descriptors

      process stdout (fd 1) = bash's stdout proces stderr (fd 2) = bash's stderr

    • The first redirection sets the process's stsout to refer to a file stream

      process stdout (fd 1) = files.txt process stderr (fd 2) = bash's stderr

    • The second redirection sets the process's stderr to a copy of the stdout stream

      process stdout (fd 1) = files.txt process stderr (fd 2) = files.txt

    • The result is that both stdout and stderr are redirected to the file
  • In the second command
    • Start with file descriptors

      process stdout (fd 1) = bash's stdout proces stderr (fd 2) = bash's stderr

    • The second redirection sets the process's stderr to a copy of the stdout stream

      process stdout (fd 1) = bash's stdout process stderr (fd 2) = bash's stdout

    • The first redirection set the process's stsout to refer to a file stream

      process stdout (fd 1) = files.txt process stderr (fd 2) = bash's stdout

    • The result is that only stderr is redirected to the file and stderr is redirected to bash's standard out

Here documents

  • A Here document is a type of redirection which reads lines from the shell until a terminating word and uses that for the standard input to a command
    cat <<EOF
    some line
    another line
    EOF
    
  • The terminator must be on it's own line with no blanks
  • The lines inside the heredoc are expanded as if they are in double quotes

Piping

  • Connects standard output of one command to standard input of another
    cat file | wc -l
    
  • With stderr
    cat file |& wc -l
    
  • You can continue piping as many commands as you want

Tee

  • A potential issue with redirection is that the user gets no feedback when output is silently sent to a file
  • The tee command writes its standard input to its standard output while also writing it to one or more files
  • You may see this in shell scripts to show what was written to a file
    echo somestring | tee output.txt
    
  • tee can append with -a
    echo somestring | tee -a output.txt
    

Execution

  • First word after any variable assignments is interpreted as a command name, following words are arguments
  • If the word contains slashes it's interpreted as a file path to execute
  • Otherwise word is looked up in (in order)
    1. Functions
    2. Builtins
    3. PATH variable
  • Variable assignments before the command name are set as environment variables in the executed command
  • Note that in bash the which command is not a builtin and has no idea about builtins
    which echo
    /usr/bin/echo
    
    • Which returned the location of echo on the system, but if the first word of your command expands to echo the shell builtin wil be called instead
    • There may be differences in parameters between the builtin and the installed version on the system
    • To find information on a builtin use help instead of man
    • You can also tell bash to run the non-builtin by using a path
    $(which echo) a b c
    
    • Will run echo installed on the system instead of the builtin

Exercises

Why aren't aliases in the resolution order?

Answer

Aliases were already expanded after reading the line. The alias itself isn't executed, the command it expands to is

What do the following output?

myvar=1 echo $myvar
Answer
  • Nothing
  • myvar is not a shell variable, it's an environment variable set up for the echo process
  • so $myvar expands to an empty string
myvar=1 env | grep myvar
Answer
mvar=1

myvar is an environment variable in the env process which it outputs and gets filtered by grep

myvar=1; env | grep myvar
Answer

Nothing, myvar is a shell variable, not exported to subprocesses as an environment variable

Does this work? what does it do?

var="echo"
"$var" test
Answer
  • Echos test
  • First word is expanded to echo before it is executed

Subshells

  • When bash runs a function or control flow construct as a simple command the function will be run in the same bash process, however some situations can cause bash to need to fork
  • This is called running in a subshell
  • The fork causes the subshell to have the same bash variables since it's a copy of the parent's memory
  • Any assignment to variables is assigning to that copy and will not be seen by the parent shell
  • Backgrounding
    • Bash forks before running the backgrounded command so it can run at the same time
  • Piping
    • Each command in the pipe runs at the same time in separate processes
  • Command substitution
  • Grouping construct
    • Surrounding a command with parenthesis force bash to run it in a subshell

Reading lines from a file

Create a file with a few lines

echo -e "1\n2\n3\n4" > file.txt

The following outputs 0, why?

result=0
cat file.txt | while read number; do
  result=$(( "$result" + "$number" ))
done
echo $result
Answer

The file is piped into while, so while is running in a subshell with a copy of the result variable

You can redirect input into the loop instead of piping

result=0
while read number; do
  result=$(( "$result" + "$number" ))
done < file.txt
echo $result

Or you could use a for loop. But the following doesn't work, why?

result=0
for number in "$(cat file.txt)"; do
  result=$(( "$result" + "$number" ))
done;
echo $result
Answer
  • The for loop iterates over words following the word in
  • The quotes around the command substitution cause it to result in a single word with the entire contents of the file
  • Without the quotes the contents of the file will be split into words based on the value of IFS (includes space, newline, tab by default) when bash performs word splitting
result=0
for number in $(cat file.txt); do
  result=$(( "$result" + "$number" ))
done;
echo $result