The Bourne Ambiguity

I have a great ambivalence about the Bourne shell. It is universally available on any Unix system, and thus is one of the only truly portable, universal ways of expressing behaviour in a Turing-complete language, such that it is executable on UNIX without regard to architecture, binary executable formats, ABIs, what other shells or scripting languages are available, etc. (This property sees it used somewhat byzantinely in tools such as the makeself self-extracting archive generator, where binary data is appended to a Bourne shell script. Not a lot of options.)

It is somewhat tragic, then, that the Bourne shell has such fundamental issues as not supporting lists. It relies on the horrendous $IFS hack to try and understand lists expressed via its everything-is-a-string type unsystem. Particularly troublesome are the potential security implications, or the tendency of scripts to spontaneously break when, years after they are written, someone decides to put a space in a filename.

The Bourne shell essentially admits this failing by applying special cases without which it would be unusable. Namely, the quoted string "$@" expands to a series of quoted strings, a complete divergence from the normal behaviour applied only when quoting $@. This makes it practical to execute programs with the verbatim arguments passed to a shell script.

I've written a lot of Bourne shell scripts, and although I am downright obsessive about quoting every variable expansion, there are still severe ambiguities. For example, if you glob into a variable and then want to iterate through the items, you're screwed; spaces in filenames strike again. I am resigned to assuming any shell script I write contains unknown security disasters, especially when you consider that in the UNIX filesystem, the only illegal characters for a filename are / and the NULL byte. Filenames can contain spaces, asterisks, newlines, ASCII control codes, Unicode control codes...

This page lists various common errors when writing sh or bash. Perhaps most hilariously, it correctly notes that the following is wrong for an interactive shell:

echo "Hello, world!"

Alternatives. In terms of the Bourne shell, there are few alternatives with as wide availability: perhaps awk, expect (sometimes useful, but often not installed by default), perl (increasingly less commonly installed by default), bash.

bash is commonly available, and supports some manner of array, which is perhaps safer. A cursory investigation suggests that iterating through a list requires the following syntax:

a=("John Smith" "Jane Doe" "Mary Jones")
for x in "${a[@]}"; do echo "$x"; done

The syntax "${a[@]}" is here special cased again, expanding to a series of quoted items. bash doesn't escape being a Bourne shell with arrays; it still comes down to idiosyncratic special casing. Still, the ability to handle filenames with spaces safely in circumstances other than argument processing is at least welcome.

Obscure shells. If the availability requirement is discarded, there are some interesting possibilities. rc, the Plan 9 shell, does, I understand, support lists, hallelujah. Some manner of Lisp could probably be conceived that feels “native” to the Unix filesystem.