POSIX Shell

POSIX Shell Scripting

This page is meant as a repository for useful tricks I've found (and some I've perhaps invented) for scripting the POSIX shell (with some attention to portability to non-conformant shells as well, scattered here and there). I am a strong believer that Bourne-derived languages are extremely bad, on the same order of badness as Perl, for programming, and consider programming sh for any purpose other than as a super-portable, lowest-common-denominator platform for build or bootstrap scripts and the like, as an extremely misguided endeavor. As such you won't see me spending many words on extensions particular to ksh, Bash, or whatever other shells may be popular.

Printing the value of a variable

printf %s\\n "$var"

The "\n" may be omitted if a following newline is not desired. The quotation marks are essential. The following is NOT a valid substitute:

echo "$var"

NEVER use echo like this. According to POSIX, echo has unspecified behavior if any of its arguments contain "\" or if its first argument is "-n". Unix™ standards fill in this unspecified area for XSI-conformant implementations, by specifying nasty undesirable behavior that no one wants ("\" is interpreted as a C-string-literal style escape), and other popular implementations such as Bash interpret argument values other than "-n" as special options even when in "POSIX compatibility" mode, rendering then nonconformant.

You never imagined printing the value of a variable could be so difficult, eh? Now you see why I say Bourne-derivative languages should never be used for serious programming...

Reading input line-by-line

IFS= read -r var

This command reads a line of input, terminated by a newline or end of file or error condition, from stdin and stores the result in var. Exit status will be 0 (success) if a newline is reached, and nonzero (failure) if a read error or end of file terminates the line.

One common pitfall is trying to read output piped from commands, such as:

foo | IFS= read var

POSIX allows any or all commands in a pipeline to be run in subshells, and which command (if any) runs in the main shell varies greatly between implementations — in particular Bash and ksh differ here. The standard idiom for overcoming this problem is to use a here document:

IFS= read var << EOF
$(foo)
EOF

Reading input byte-by-byte

read dummy oct << EOF
$(dd bs=1 count=1|od -b)
EOF

This command leaves the octal value of a byte of input in the variable oct. Note that dd is the only standard command which can safely read exactly one byte of input with a guarantee that no additional bytes will be buffered and lost.

Writing bytes to stdout by numeric value

writebytes () { printf %b `printf \\\\%03o "$@"` ; }
writebytes 65 66 67 10

This function allows specification of byte values in base 8, 10, or 16. Octal and hex values must be prefixed with 0 or 0x, respectively.

Using find with xargs

GNU fans are accustomed to using the -print0 and -0 options to find and xargs, respectively, for robust and efficient application of a command to all results of the find command. Without GNU extensions, the output of find is newline-delimited, meaning there is no way to recover the actual pathnames found if some of the pathnames contain embedded newlines.

If you don't mind having your script break when pathnames contain newlines, at least make sure that the misprocessing that will result cannot lead to a compromise of privilege, and then try the following:

find ... | sed 's/./\\&/g' | xargs command

The sed command here is mandatory. Contrary to popular belief, xargs does NOT accept newline-delimited lists. Rather it accepts shell-quoted lists, i.e. the input list is separated by whitespace and all internal whitespace must be quoted.

Using find with +

Of course the much smarter way to use find to efficiently apply commands to files is with -exec and a "+" replacing the ";":

find path -exec command '{}' +

This causes find to place as many filenames as will fit on the command line in place of the "{}", each as its own argument. There is no issue with embedded newlines being misinterpreted.

Getting non-clobbered output from command substitution

The following is not safe:

var=$(dirname "$f")

Due to most commands writing a newline at the end of their output, Bourne-style command substitution was designed to strip training newlines from the output. But it doesn't just strip one trailing newline; it strips them all.

The solution to this problem is very simple: add a safety character after the last newline, then use the shell's parameter substitution to remove the safety character:

var=$(command ; echo x) ; var=${var%?}

Returning strings from a shell function

As can be seen from the above pitfall of command substitution, stdout is not a good avenue for shell functions to return strings to their caller, unless the output is in a format where trailing newlines are insignificant.

Try this:

func () {
body here
eval "$1=\${foo}"
}

The key trick here is the eval line and the use of escaping. The "$1" is expanded when the argument to eval is constructed by the main command parser. But the "\${foo}" is not expanded at this stage, because the "\$" has been quoted. Instead, it's expanded when eval evaluates its argument.

Shell-quoting arbitrary strings

Sometimes it's necessary to put a string in a shell-quoted form, for instance if it needs to be expanded into a command that will be evaluated with eval, written into a generated script, or similar. Here is a version that works:

quote () { printf %s\\n "$1" | sed "s/'/'\\\\''/g;1s/^/'/;\$s/\$/'/" ; }

This function simply replaces every instance of «'» (single quote) within the string with «'\"» (single quote, backslash, single quote, single quote), then puts single quotes at the beginning and end of the string.

Working with arrays

Unlike "enhanced" Bourne shells such as Bash, the POSIX shell does not have array types. However, with a bit of inefficiency, you can get array-like semantics in a pinch using pure POSIX sh. The trick is that you do have one (and only one) array — the positional parameters "$1", "$2", etc. — and you can swap things in and out of this array.

Replacing the contents of the "$@" array is easy:

set -- foo bar baz boo

Or, perhaps more usefully:

set -- *

Does a given string match a given filename (glob) pattern?

fnmatch () { case "$2" in $1) return 0 ;; *) return 1 ;; esac ; }

Now you can do things like:

if fnmatch 'a??*' "$var" ; then ... ; fi

So much for needing Bash's "[[" command...

Final remarks

Expect this page of tricks to grow over time as I think of more things to add. It is my hope that these tricks serve to show that is IS possible to write correct, robust programs using the plain POSIX shell, despite common pitfalls, but also that the lengths needed to do so are often extremely perverse and inefficient. If seeing the above hacks has inspired anyone to write a program in a real language rather than sh/Bash/whatever, or to fix corner case bugs arising from the badness of the shell language, I will be happy.