The Straight Line Principle
Which of the following psuedocode examples do you prefer?
// Example 1a
func someFunc() int {
err = doStuff()
if err {
return 0
}
err = doMoreStuff()
if err {
return 0
}
err = doYetMoreStuff()
if err {
return 0
}
return 1
}
// Example 1b
func someFunc() {
err = doStuff()
if !err {
err = doMoreStuff()
if !err {
err = doYetMoreStuff()
if !err {
return 1
}
}
}
return 0
}
Certainly I prefer 1a. In the latter example, you have to keep track of your control flow position at all times when reading and writing the function. In the former case, you deal with variant cases and get them out of the way, which means that you can make more assumptions at later points in the function. In the latter case, your cognitive burden becomes greater as you move down the function; in the former, it becomes less, because you can assume a larger number of preconditions.
This leads I think to a general principle that an increase in the length of a function should ideally not lead to a corresponding increase in the “2D” complexity or nesting of the code. That is, the nesting of any given line of code in a function should not depend on how far down it is in the function, only the function's natural complexity. Where such expression is possible, it should be preferred. I call this the straight line principle.
Example 1b violates this principle, and as the function becomes arbitrarily long, the calls become arbitrarily more nested. Were there no alternative to the 1b style, this would perhaps be an indication of inadequate syntatic expressiveness.
Imagine, for a moment, that mathematically the exponentiation operator had never been invented. If you wanted to write 24, you had to write 2×2×2×2. Thus for sufficiently large values of n for 2n, you end up writing arbitrarily many ×2. This is a problem, thus a higher order construct is invented, the index notation, which collapses linear growth in syntax with problem size to essentially zero growth (I say essentially because I suppose it could be argued that the larger number of digits needed to represent larger numbers in the index in base 10 notation could be argued to be an increase in syntatic complexity; of course, standard form can be used if precision is not important).
The logical response to such situations is to increase the expressive power of the syntax by introducing higher order constructs that render linear increases in syntax sublinear or better. This is done with mathematics; if you need “repeated exponentiation”, you have tetration.
Of course the above example is rather a strawman, as the 1a style demonstrates that the complexity of 1b is readily avoided. But the issues I raise are found far too commonly in common languages, most commonly when it comes to asynchronous I/O programming. For the reasons above, I find the continuation passing style, at least as it is necessarily expressed in most languages, to be quite distasteful. For example, a skeleton of a JavaScript function:
// Example 2
function doSomething(onDone) {
doSomething1(..., function() {
doSomething2(..., function() {
doSomething3(..., function() {
doSomething4(..., onDone);
}
}
}
}
Here we have the same problem again, and this time this is a real instantiation of the problem which is less avoidable. Utterly fundamental programming constructs like I/O calls must be made asynchronously, which is fine, but the language is not designed around it, and so the continuation passing style must be used. But because the language does not provide adequate syntactic power to linearize (in the 2D sense) the expression of several sequenced I/O operations, merely adding steps to an algorithm increases the “2D” complexity of the function and thus the programmer's cognitive load in reading or writing the function. This is a disaster! It is like writing mathematics where no multiplication operator has been invented, and thus repetetive additions must be manually expressed, let alone exponentiation.
What is most outrageous about the fact that this style of programming is still practiced is how unnecessary it is. It has been possible to write compilers to take care of these matters since at least 1986 (when Erlang was created). Either actual coroutines/green threads/etc. can be used, as in Go, or a compiler can convert a program written in a linear form into the continuation passing style internally. Thus even runtime limitations or design decisions are not an excuse for making people use a syntactically nonlinear continuation passing style to do I/O. In Haskell, monads are sometimes described as a “programmable semicolon”, and statements expressed in a syntactically linear fashion are remapped by the compiler to a chain of lambdas passed to objects via an operator, an extremely powerful technique which can be used for more than just asynchronicity.
.NET solved this problem by introducing the async
and await
keywords, which
provide enough of a hint to the compiler to allow it to rewrite to the
continuation-passing style internally. ECMAScript 6 will adopt a similar
convention. Even Python, which for years has totally ignored the gift horse of
Stackless Python, is now with Python 3 introducing
continuation-passing-style-based “coroutines”, bizarrely after having
accidentially implemented them previously as part of the generators feature.
CPS-style coroutines are slightly poorer than real, stack-based coroutines insofar that the program must be annotated so that the compiler can transform it appropriately. .NET, ECMAScript 6 and Python will all necessitate this. But at least for the expression of completely routine tasks such as I/O, the addition of linear steps causes only corresponding linear (“1D”) increases in syntactic complexity.
The straight line principle is, then, this: In a given language, adding steps to a function in a way that does not increase cyclomatic complexity should not result in an increase to the “2D” syntactic complexity of the function. As such steps are added, the expression of the program should not veer off to the right, in the x-axis, but remain roughly constant in the x-axis and increase in length in the y-axis.
The later stages of the function should come after the former stages of the function, just as the later paragraphs in a novel come after the former paragraphs. If one takes the simplifying assumption for the sake of example that in a novel, the events expressed by any given paragraph happen strictly after all previous paragraphs, then novels are in chronological order. Nobody would read a novel where the relationship from one moment to the next is not expressed merely implicitly by the ordering of sentences, but where every sentence ends in a parenthetical starting with “and then, after that” containing the entire rest of the book; and then the last few pages are filled only with closing parentheses.
This is not how humans like to ingest and understand information which involves some form of progression over time. Since the technology to spare programmers from having to do this has existed for decades, there shouldn't be any reason for anyone to suffer the use of a syntactically nonlinear continuation-passing style.