Graduate Program KB

  1. Background
  2. Command Types
  3. Parameters
    1. Variables
    2. declare
    3. Variables Provided by Bash
    4. Special Parameters
    5. Parameter Referencing and Expansion
    6. Referencing Arrays
  4. Compound Commands
    1. Subshells
    2. Command Substitution and Process Substitution
    3. Arithmetic Expansion
    4. Functions
  5. Conditionals
    1. Truth in Bash
    2. Logical Operators
    3. if statements
  6. Predicates
  7. Loops
    1. While and until
  8. For
    1. Case and Select
  9. Redirection
    1. File Descriptors
    2. File Redirection
    3. Pipelines
    4. Heredocs and Herestrings
  10. Sources and Further Readings

Background

Bash is a shell program and command language originally released in 1989 as a free software alternative to sh (aka the Bourne shell). In 1991 Bash would be one of the first programs Linus Torvalds would port to his free hobby project operating system 'Freax', which would eventually become Linux. Because of this history, Bash is included by default in most modern Linux distributions and sister OSes. This talk aims to cover the basics and showcase some of the powerful features of bash you might not be aware of to help you develop the skills necessary to write powerful and portable scripts you can use automate systems administration tasks in a huge number of environments.

Command Types

Bash is a command language. Ignoring compound commands, every line processed by bash will be interpreted as a list of space-separated words, with the first word always being the command and the rest being arguments. Command name binding is evaluated in this order:

  1. Aliases (eg. your aws profile commands)
  2. Functions
  3. Builtins (eg. alias, declare, echo, cd, help, exit, etc)
  4. Keywords (eg. if, then)
  5. Executables (eg. ls, pwd, pnpm, aws, etc.)

⚠️ Common Issue!

The final step involves check each directory in the PATH variable for an executable file with a matching name. This can cause issues when running certain scripts with sudo, as the root user will have a separate PATH variable.

Parameters

Parameters are named spaces in memory used to store information. They are essentially variables but are categorized into two types:

  • Variables as we know them in languages like JavaScript and Special Parameters, covered here.

Variables

Variables are set using the assignment operator = and unset using the unset command:

my_var="Lorem ipsum"
echo $my_var
    # Output: Lorem ipsum
unset my_var
echo $my_var
    # Output: (none)

Valid identifiers consist only of alphanumeric characters and underscores and have to begin with either a letter or an underscore.

⚠️ Common Issue!

Notice that there are no spaces around the assignment operator. Remembering how bash interprets input lines, this is mandatory so as to stop bash from seeing 'my_var' as a command that it needs to run with the arguments '=' and 'Lorem ipsum"

declare

Bash actually has a hidden type system that governs how you can interact with variables. You can set a variable of a certain type by using 'declare' with the associated option:

declare -a arr=('foo' 'bar' 'baz') # array
declare -A map=([key1]=value1 [key2]=value2) # associative array
declare -r my_var='lorem ipsum' # read-only variable
declare -x my_var='lorem ipsum' # exported variable
  • While you can create arrays without using declare associative arrays (which store key-value pairs) must have their type assigned by declare to work properly.
  • Read-only variables can't be unset or set to another value.
  • Exported variables are carried forward into any child processes you might create.

You can also determine the type and composition of an object by using the print option -p.

declare -p SHELL
    # Output: declare -x SHELL="/bin/bash"
declare -p arr
    # Output: declare -a arr=([0]="foo" [1]="bar" [2]="baz")

Variables Provided by Bash

Bash provides users with some variables you can use if you need environment information or a generated value (eg. $RANDOM)

  • BASH_VERSION
  • HOSTNAME
  • PPID - PID of the parent process of the current shell
  • PWD - Working directory
  • RANDOM - A (pseudo-)random number between 0 and 32767
  • UID - ID number of the current user
  • HOME - The current user's home directory
  • PATH - A colon-separated list of paths that will be searched for executables matching the command name For the full list, visit this section of the GNU Bash reference manual.

Special Parameters

Special parameters are also set automatically by Bash but store more data about current and recent execution rather than the execution environment.

  • "$0" - Name of the script
  • "$1", "$2", "$3", etc - Positional parameters containing arguments passed to the script or function
  • "$*" - Expands to a single string containing all positional parameters
  • "$@" - Expands to all the words of the positional parameters
  • $# - The number of set positional parameters
  • $? - The exit code of the most recently completed foreground command
  • $# - The PID of the current shell
  • $! - The PID of the command most recently executed in the background
  • "$_" - The last argument of the last command that was executed

For the full list, visit this section of the GNU Bash reference manual.

Parameter Referencing and Expansion

To expand the value of a variable into the string value it is bound to, use the variable name preceded by a dollar sign:

echo "Hi!" $my_var
    # Output: Hi! Lorem ipsum
echo "Hi!" "${my_var}"
    # Output: Hi! Lorem ipsum

⚠️ Common Issue!

While most of the time you can just leave it there as in the first example, this can be an issue when using commands with positional parameters. In the first example echo is actually being given 3 arguments ('hi', 'lorem' and 'ipsum') due to word splitting.

Keep in mind that using single-quotes prevents parameter expansion:

echo "Hi!" '${my_var}' # Hi! ${my_var}

Modifying the syntax of a parameter expansion can allow you to do more than just reference the bare value. Check out these examples:

echo "${#my_var}" # Prints the length of my_var
echo "${my_var:-word}" # Use a default value 'word' if my_var is unset or null
echo "${my_var:=word}" # If my_var is unset or null, use the default value and assign it to my_var
echo "${my_var:offset:length}" # Expands to a substring of my_var starting at 'offset'

For the full list of alternative parameter expansion syntax, visit this section of the GNU Bash reference manual.

Referencing Arrays

Arrays are accessed like you might expect with the usual square bracket notation. You can expand each value in an array to a word by using the at symbol in place of an index.

echo "${arr[0]}" # foo
echo "${arr[@]}" # foo bar baz
echo "${#arr[@]}" # get the length of an array

You can remove values from arrays using unset but you have to use quotes as otherwise they would be parsed as a glob (covered here).

unset "arr[2]" # remove the third array element

Compound Commands

Subshells

Subshells are, just like the name implies, like child execution contexts. Commands put inside parentheses will inherit information from the current execution context but won't affect it, like how in this code block the subshell changes directory but doesn't change the directory of the current context.

pwd
    # Output: /home/username
(cd /usr; pwd; exit 1)
    # Output: /usr
pwd
    # Output: /home/username

Command Substitution and Process Substitution

Output to the stdout stream from subshells can be used as input to commands in the main context through substitution.

  • Command substitution will insert the output of an invocation as a word:
var=$(echo "hi from the subshell!")
echo "${var}"
    # Output: hi from the subshell!
  • Process substitution allows you to use output as a file that other commands can use
nvim <(ls -la) # Opens nvim using the output of ls -la as a 'file'

Arithmetic Expansion

There are two main ways to get bash to evaluate arithmetic. The first is with the double parentheses notation and the second is thought the let builtin.

var=(( (2+5) / 7 ))
echo $var # Output: 1

let a="(2 + 5) / 7" "b =1+1" "c = a + b"
echo $c # Output: 3

Once you're in this calculation mode the syntax becomes much more friendly. Variables can be both assigned and referenced without dollar signs and assignment can be done without strictly correct spacing. Any mathematical operators you're familiar with from JavaScript will work (full list in this section of the GNU Bash Reference Manual).

Functions

function print_ls_to_file() {
    local directory_name=$1
    local file_name=$2
    if [[ -f file_name ]]; then
        echo "File already exists!" 1>&2
        return 1
    fi

    ls $directory_name > $file_name
}

print_ls_to_file /home/you ls.log

⚠️ Common Issue!

You should create variables local to the function context using local or declare to avoid inducing side effects like changing a global variable.

Conditionals

Truth in Bash

In Bash every command, whether its a function or an executable, will return an exit code, which is an int between 0 and 255. Conventionally 0 indicates the program was successful and any other number indicates failure, but what each number means will vary from program to program. Bash uses exit codes to represent truth following that convention, where anything returning 0 is true and anything returning anything else is falsy.

Logical Operators

You can use the logical operators && and || to execute a command upon a command's success or failure. This can be particularly useful in scripts as a simple implementation of exception handling. Remember as well it's good practice to return a failing exit code yourself if something went wrong.

mkdir my_new_dir && cd my_new_dir
    # If mkdir returned 0 (success), cd into the new dir
rm $file || { echo "Couldn't delete ${file} :(" >&2; exit 1; }
    # If file couldn't be removed, log that to stderr and get tf outta there
! aws s3 rb $my_bucket_uri
    # Negate exit code. If command was successful, returns a failure code and vice versa

if statements

if [[ predicate ]]; then
    # commands
elif my_command "hope this works"; then
    # commands
else
    # commands
fi

Predicates

[[ "${val}" == "a" ]] # String Equality
[[ "${val}" != "a" ]] # String Inequality
[[ "${val}" < "a" ]] # Alphabetically sorts before
[[ "${val}" > "a" ]] # Alphabetically sorts after
[[ "${val}" =~ $regex ]] # Matches regex expression

[[ $val -eq 1 ]] # Numeric Equality
[[ $val -ne 2 ]] # Numeric Inequality
[[ $val -lt 2 ]] # Less than
[[ $val -le 2 ]] # Less than or equal
[[ $val -gt 2 ]] # Greater than
[[ $val -ge 2 ]] # Greater than or equal

[[ -z "${val}" ]] # Variable is null
[[ -n "${val}" ]] # Variable is not-null

[[ -f "${file}" ]] # File exists
[[ -d "${file}" ]] # Directory exists
[[ -e "${file}" ]] # File/Directory exists

[[ -r "${file}" ]] # File is readable
[[ -w "${file}" ]] # File is writable
[[ -x "${file}" ]] # File is executable

⚠️ Common Issue!

Notice that there's a mandatory space between the square brackets and the predicate components. This is because the double square opening brackets is actually a command that takes in every word until the closing square brackets as arguments. If you were to leave out the space, either bash would look for the wrong command or wouldn't be able to find the closing bracket keyword. That's also a reason you need to make especially sure you use variables with the double quote and curly braces format to make sure your arguments are supplied correctly without unneeded splitting.

Loops

While and until

# executes until [[ returns false
while [[ predicate ]]; do
    # commands
done

# executes until [[ returns true
until [[ predicate ]]; do
    # commands
done

For

#for-each
for item in ${arr[@]}; do
    echo $item
done

#c-style
for i in ((i = 0; i < 10; i++)); do
    echo $i
done

#range
for item in {1..10}; do
    echo $i
done

#command result
for item in $(compgen -c); do
    echo $item
done

#glob
for item in ./content/*.md; do
    echo $item # echoes path of all markdown files in content directory
done

Case and Select

landmarks=("Opera House" "Statue of Liberty" "Empire State Building" "Burj Khalifa" "Burj Al Arab")
echo "Which landmark do you want to see?"
select choice in "${landmarks[@]}"; do
    case $choice in
        "Opera House") echo "You should go to Sydney"; break ;;
        "Statue of Liberty") ;&
        "Empire State Building") echo "You should go to New York"; break ;;
        Burj*) echo "You should go to Dubai"; break ;;
        *) echo "uhh I dunno that one" ;;
    esac
done
  • select is a looping construct that will prompt the user with a list of numbered choices users can pick. Select will keep looping until it reaches a break command like we have in our case statement.
  • case is sort of like a switch-case but matches based on glob patterns or direct string equality. All cases end with two semicolons if you want to leave the case statement, or a semicolon with an ampersand if you want execution to fall through to the next case. This can be done even if the next case isn't a match. Since matches are based on globs, you can create a default case by using the star glob.

Redirection

File Descriptors

File descriptors are numbers that refer to sources of data, like files or streams. Every linux process, even if they don't actually use them, will start with these 3 file descriptors:

  • stdin - File Descriptor 0
  • stdout - File Descriptor 1
  • stderr - File Descriptor 2

Redirection is the practice of changing a file descriptor to point to a different location, so input can come from and output can go to somewhere other than straight to the terminal.

File Redirection

To redirect stdout to a file, use the right facing angle bracket with the filename. This redirects file descriptor 1 by default, but you can choose to redirect a different file descriptor instead by prepending the operator with the file descriptor number. You can redirect stdin by using the left facing bracket so a command will read stdin from somewhere else, and redirect other input streams in the same way:

echo "Lorem ipsum" > placeholder.txt
echo "$(< placeholder.txt)"
    # Output: Lorem ipsum

rm 2> /dev/null

⚠️ Common Issue!

One thing to note here is that by default, redirecting output to a file will replace the file's contents with the contents of the stream. If you need to append instead, use the double angle bracket operator instead:

echo "dolor sit amet" >> placeholder.txt
echo "$(< placeholder.txt)"
    # Output: Lorem ipsum dolor sit amet

⚠️ Common Issue!

While you may be tempted to use cat to get input from a file and pipe it to another command, using cat for any other purpose than its intended use of file concatenation is actually considered bad practice, and you should use redirection instead.

⚠️ Common Issue!

So what do we do when we need both stdout and stderr to write to the same place? Redirecting both inputs to the same file can make stdout and stderr compete for writes, which can cause unwanted behaviour and incorrect output. You might also think to redirect one of the outputs to the file descriptor of the other as in the second line, but this usage will actually cause stderr to write to a file named '1'. Instead, you should use a duplicated file descriptor with the ampersand like in the third line, so stdout writes to ls.log and stderr writes to stdout.

ls / not-a-file > ls.log 2> ls.log # wrong
ls / not-a-file > ls.log 2>1 # wrong

ls / not-a-file > ls.log 2>&1 # right!
ls / not-a-file &> ls.log # shorthand for above. not portable to sh

Pipelines

command1 | command2 | command3

Piping is a method for redirecting one command's stout to another's stdin for each command from left to right. We call these chains of piped commands pipelines.

⚠️ Common Issue!

Be wary of overusing pipelines in your scripts though. Each command in a pipeline is executed in its own subshell, adding some overhead and potentially causing unintended behaviour if you're somehow changing the execution context in one of the commands. They can also end up difficult to follow if your pipelines get too long.

Heredocs and Herestrings

If you find yourself piping an echo into another program's stdin, it's usually a better idea to use something called a 'herestring' as they don't involve creation of a subshell and can be easier to interpret.

echo $target_directory | ls
ls <<< "$target_directory"

You can also use here documents or 'heredocs' to write multiline strings to stdin. This is done with 2 left angle brackets instead of 3 and a 'delimiter string' which can be any sequence of characters. It's bad practice to embed larger blocks of data into your scripts this way, but it's good for cases like outputting usage information.

cat <<END
usage: mycommand [-o] [file ...]
Lorem ipsum dolor sit amet
END

Sources and Further Readings