Psychological effects of coding style

One of the weird things about adopting a coding style is its psychological impact. I can't describe how significant that impact is. In some ways, it's about creating your own language within a language and coming to feel familiar in it. In some ways, it's having enough trust in your own consistent use of the style that you trust the information conveyed by that style completely when reading and reasoning about code.

However, there also seem to be psychological differences between different styles. For me, and somewhat surprisingly, the psychological imprint of foo_bar_baz is very different to FooBarBaz, even ignoring the different meanings I have assigned to these different styles; the inherent pause in _ makes foo_bar_baz gives it a fundamentally slower and low-energy psychological footprint.

In C, foo_bar_new is common as a naming style. NewFooBar is also seen, but FooBarNew, curiously, is very rare. This would seem to affirm the idea that there's a substantial psychological difference between these styles; with snake_case, the connotation is closer to grouping, an ordered hierarchy of functions. With CamelCase, there's a greater appeal to the rules of the programmers' natural language and away from grouping; it's very rare to see grouping of CamelCase such as FooLib_BarBaz_New. Even in languages without namespacing, when CamelCase is used at global scope, grouping will tend to be limited to a short prefix such as gl (OpenGL) or Rtl, etc. (Windows NT kernel); these prefixes seem psychically compressed, rather than being intended to explicitly psychically impress the implications of the grouping.

A quick search shows that there's been some discussion on the impact of choosing a consistent coding style on productivity, but it would be interesting to know whether there are differences in productivity offered by particular different coding styles, assuming each is consistently adopted.

Different people want different information offered in their coding style. Some people like hungarian notation. I usually don't care for it, though I adopt similar mechanisms to convey things other than type.

In C++, here's a very concise but actually very complete summary of my style:

  • Two spaces.
  • Opening braces on the same line.
  • Class names: CamelCase.
  • Method names: CamelCase.
  • Fields: camelBack.
  • Static class member constants, variables and functions: snake_case.
  • Functions: CamelCase.
  • Globals: g_camelBack.
  • Locals: camelBack.
  • Statics: camelBack.
  • Macros: SNAKE_CASE, of course.
  • Macros, method names, fields, static class members and static (i.e., unexported) variables in function or translation unit scope are all prefixed with _ if they are internal to the class, i.e. if they cannot or should not be accessed by other code.
  • Interface classes are prefixed with I.
  • Enumeration types are prefixed with E, and their values named using CamelCase. enum class is always used, so prefixing is not necessary.
  • Pointer and reference sigils on the right, to reflect the actual syntax (think int foo, *bar;).

An exception to this is when implementing STL-esque types (containers, smart pointers), in which case I follow the style of the C++ standard library. This is rather inconsistent, but this in itself leads to a codebase having a sense of distinction between “utility types” and “domain types”.

My style for C codebases is quite different: I use snake_case for basically everything. This is probably because a) I developed that style separately, b) because C is simpler and so there are fewer things to need to distinguish between and c) because that style is more predominant in C codebases.

Another factor may be the lack of namespacing in C, which essentially necessitates the ability to convey a hierarchy in an identifier (e.g. foolib_barbaz_new); by contrast, and as I mentioned above, it's very rare to see people grouping CamelCase. For this reason, globals drop the scope hungarian prefix and are named in the same style as functions, with namespace-prefixed snake_case. Underscore prefixing for internals is still observed.

The _ prefixing is important to me because I like to know with high confidence the scope of a variable or method, because that affects my ability to reason about it and the conditions which can occur. Interestingly, I developed this style before encountering Go, but Go is quite similar; package-internal identifiers start with a lowercase letter, so I get the same information from Go as I do from my style in C++, albeit at package and not class scope.

Other styles commonly use a m_ prefix for member internals. For some reason that is very psychically noisy to me, though I'm sure I could get used to it. But it's also subtly different because it's emphasizing the fact that it's a member, and not the fact that it's internal.

So essentially all of the above considerations about what information you want readily available with high confidence, and which is just noise to the part of your mind accustomed to reading, reasoning about and writing code. These are linguistic considerations.

Some people want data type (systems hungarian); some want a semantic data type (applications hungarian). Some want scope indication. Indication of identifier nature (class, method, function, variable, macro) is particularly common. Whether you use hungarian (as in my E and I prefixes above) or different casing doesn't change the fact that you're deciding to convey that information.

One thing I'm surprised isn't seen more often is thread restrictions. Just as reasoning about scope restrictions is useful to me, so is having information about the threads on which a function is supposed to execute. When I do end up using multiple threads, I tend to end up naming them or their groups and putting abbreviations into function names. It's rarer to need to do this in Go with goroutines, but sometimes if the interaction between goroutines ends up hairier than one would like I still end up doing it.

This sort of information can also be conveyed by a documentation comment on the function or a pseudo-keyword:

#define NATIVE_THREAD_ONLY
NATIVE_THREAD_ONLY int foo();

The Win32 API tends to like using these (IN, OUT, INOUT, etc.), but I'm not too fond of them.

Often one thread gets named “native”, with any other thread being deemed “xeno”. Xeno threads can only call xeno functions, and xeno functions will usually simply signal the native thread in some way. (A simple example of a case where these issues arise is when dealing with the Win32 GUI system, as the Win32 message queue is thread local and thus windows must be serviced on the thread which created them.)

(You could, I suppose, create a whole type system around this idea, but it would probably be more practical to add a compiler attribute for generating warnings for inter-category calls. clang seems to have a static analyser with similar ideas.)

Other examples of indicating function properties in function names can be found in the Windows NT kernel, with the Nt series of functions and their corresponding Zw functions, which are distinct only in whether they validate input; however, this is probably done more out of necessity for the purposes of disambiguation rather than as a reasoning aid, but it becomes a language anyway.

At any rate, returning to the idea of fundamental psychological differences between casing conventions, here's how I would psychoanalyse my own preferences:

  • I use CamelCase for classes because classes are something advertised to the rest of the codebase for use. Likewise, you see CamelCase used in product marketing names a lot. It's “higher energy”.
  • Using CamelCase for method names represents the imperative nature of the call, as well as its public advertisement. Possibly I'm more prone to use this style for methods than functions because linguistically, I lean towards imperative interpretations of language when there's a specific object in mind.
  • Using underscore prefixes represents concealment and a desire to hide things or in some way declare them unofficial.
  • Using camelBack inside functions: As I mentioned, CamelCase has higher psychic intensity than snake_case, which I prefer when reasoning about algorithms.
  • I: no inherent connotations, but I like a clear division between interface and implementation.