Psychological effects of coding style
One of the weird things about adopting a coding style is its psychological impact. I can't describe how significant that impact is. In some ways, it's about creating your own language within a language and coming to feel familiar in it. In some ways, it's having enough trust in your own consistent use of the style that you trust the information conveyed by that style completely when reading and reasoning about code.
However, there also seem to be psychological differences between different
styles. For me, and somewhat surprisingly, the psychological imprint of
foo_bar_baz
is very different to FooBarBaz
, even ignoring the different
meanings I have assigned to these different styles; the inherent pause in _
makes foo_bar_baz
gives it a fundamentally slower and low-energy
psychological footprint.
In C, foo_bar_new
is common as a naming style. NewFooBar
is also seen, but
FooBarNew
, curiously, is very rare. This would seem to affirm the idea that
there's a substantial psychological difference between these styles; with
snake_case
, the connotation is closer to grouping, an ordered hierarchy of
functions. With CamelCase
, there's a greater appeal to the rules of the
programmers' natural language and away from grouping; it's very rare to see
grouping of CamelCase
such as FooLib_BarBaz_New
. Even in languages without
namespacing, when CamelCase
is used at global scope, grouping will tend to be
limited to a short prefix such as gl
(OpenGL) or Rtl
, etc. (Windows NT
kernel); these prefixes seem psychically compressed, rather than being intended
to explicitly psychically impress the implications of the grouping.
A quick search shows that there's been some discussion on the impact of choosing a consistent coding style on productivity, but it would be interesting to know whether there are differences in productivity offered by particular different coding styles, assuming each is consistently adopted.
Different people want different information offered in their coding style. Some people like hungarian notation. I usually don't care for it, though I adopt similar mechanisms to convey things other than type.
In C++, here's a very concise but actually very complete summary of my style:
- Two spaces.
- Opening braces on the same line.
- Class names:
CamelCase
. - Method names:
CamelCase
. - Fields:
camelBack
. - Static class member constants, variables and functions:
snake_case
. - Functions:
CamelCase
. - Globals:
g_camelBack
. - Locals:
camelBack
. - Statics:
camelBack
. - Macros:
SNAKE_CASE
, of course. - Macros, method names, fields, static class members and static (i.e.,
unexported) variables in function or translation unit scope are all
prefixed with
_
if they are internal to the class, i.e. if they cannot or should not be accessed by other code. - Interface classes are prefixed with
I
. - Enumeration types are prefixed with
E
, and their values named usingCamelCase
.enum class
is always used, so prefixing is not necessary. - Pointer and reference sigils on the right, to reflect the actual syntax
(think
int foo, *bar;
).
An exception to this is when implementing STL-esque types (containers, smart pointers), in which case I follow the style of the C++ standard library. This is rather inconsistent, but this in itself leads to a codebase having a sense of distinction between “utility types” and “domain types”.
My style for C codebases is quite different: I use snake_case
for basically
everything. This is probably because a) I developed that style separately, b)
because C is simpler and so there are fewer things to need to distinguish
between and c) because that style is more predominant in C codebases.
Another factor may be the lack of namespacing in C, which essentially
necessitates the ability to convey a hierarchy in an identifier (e.g.
foolib_barbaz_new
); by contrast, and as I mentioned above, it's very rare to
see people grouping CamelCase
. For this reason, globals drop the scope
hungarian prefix and are named in the same style as functions, with
namespace-prefixed snake_case
. Underscore prefixing for internals is still
observed.
The _
prefixing is important to me because I like to know with high
confidence the scope of a variable or method, because that affects my ability
to reason about it and the conditions which can occur. Interestingly, I
developed this style before encountering Go, but Go is quite similar;
package-internal identifiers start with a lowercase letter, so I get the same
information from Go as I do from my style in C++, albeit at package and not
class scope.
Other styles commonly use a m_
prefix for member internals. For some reason
that is very psychically noisy to me, though I'm sure I could get used to it.
But it's also subtly different because it's emphasizing the fact that it's a
member, and not the fact that it's internal.
So essentially all of the above considerations about what information you want readily available with high confidence, and which is just noise to the part of your mind accustomed to reading, reasoning about and writing code. These are linguistic considerations.
Some people want data type (systems hungarian); some want a semantic data type
(applications hungarian). Some want scope indication. Indication of identifier
nature (class, method, function, variable, macro) is particularly common.
Whether you use hungarian (as in my E
and I
prefixes above) or different
casing doesn't change the fact that you're deciding to convey that information.
One thing I'm surprised isn't seen more often is thread restrictions. Just as reasoning about scope restrictions is useful to me, so is having information about the threads on which a function is supposed to execute. When I do end up using multiple threads, I tend to end up naming them or their groups and putting abbreviations into function names. It's rarer to need to do this in Go with goroutines, but sometimes if the interaction between goroutines ends up hairier than one would like I still end up doing it.
This sort of information can also be conveyed by a documentation comment on the function or a pseudo-keyword:
#define NATIVE_THREAD_ONLY
NATIVE_THREAD_ONLY int foo();
The Win32 API tends to like using these (IN, OUT, INOUT, etc.), but I'm not too fond of them.
Often one thread gets named “native”, with any other thread being deemed “xeno”. Xeno threads can only call xeno functions, and xeno functions will usually simply signal the native thread in some way. (A simple example of a case where these issues arise is when dealing with the Win32 GUI system, as the Win32 message queue is thread local and thus windows must be serviced on the thread which created them.)
(You could, I suppose, create a whole type system around this idea, but it would probably be more practical to add a compiler attribute for generating warnings for inter-category calls. clang seems to have a static analyser with similar ideas.)
Other examples of indicating function properties in function names can be found
in the Windows NT kernel, with the Nt
series of functions and their
corresponding Zw
functions, which are distinct only in whether they validate
input; however, this is probably done more out of necessity for the purposes of
disambiguation rather than as a reasoning aid, but it becomes a language
anyway.
At any rate, returning to the idea of fundamental psychological differences between casing conventions, here's how I would psychoanalyse my own preferences:
- I use
CamelCase
for classes because classes are something advertised to the rest of the codebase for use. Likewise, you seeCamelCase
used in product marketing names a lot. It's “higher energy”. - Using
CamelCase
for method names represents the imperative nature of the call, as well as its public advertisement. Possibly I'm more prone to use this style for methods than functions because linguistically, I lean towards imperative interpretations of language when there's a specific object in mind. - Using underscore prefixes represents concealment and a desire to hide things or in some way declare them unofficial.
- Using
camelBack
inside functions: As I mentioned,CamelCase
has higher psychic intensity thansnake_case
, which I prefer when reasoning about algorithms. I
: no inherent connotations, but I like a clear division between interface and implementation.