Rethinking the filesystem as global mutable state, the root of all evil
This is a follow on from Rethinking files; read that article first.
When you invoke a function in an imperative programming language, you pass arguments to that function. But that function may also operate on global mutable state, meaning there are other factors in the operation of that function than merely the arguments you pass to it. To some extent this kind of contextual, implicit passing of state is a necessary practicality. But overuse of global mutable state is also a menace, and good programming practice dictates avoiding excessive global variables.
It occurs to me, however, that this is a lesson that operating systems could take to heart. Whenever you invoke an executable, you are implicitly passing that executable a massive amount of global mutable state: namely, the filesystem.
This has its problems. Firstly, it's hard to execute a program without passing it your whole filesystem. You can do it with chroots or containers, but both are remarkably cumbersome and awkward. In fact, the development of “containers” is arguably a product largely of this deficiency of modern operating system design.
Consider the alternative. Suppose that whenever you spawn another process, that process must explicitly be passed some sort of handle to the filesystem, or otherwise (by default) have no access to the filesystem at all. (To be clear, I'm talking at the level of OS APIs here. A shell could certainly be designed to pass the filesystem by default to any program you execute, as a matter of convenience.)
I'm also importing some of the concepts from Rethinking files here, namely the idea that handles can be held by a shell as shell variables, shells can pass handles as command line arguments to programs, and programs can return handles in turn. In other words, they're capabilities.
So here are some examples of how this might work:
$ ls /
bin
usr
etc
We execute ls
and it shows us some directories at the root of the filesystem.
$ FS=$(fs-empty) ls /
$
This time when we executed ls
it showed us an empty directory. This is
because we told our shell to invoke the ls
program with a specific filesystem
passed to it; namely, the handle to a filesystem returned by the program
fs-empty
, which spawns a virtual filesystem which contains no files and
returns the handle to it. (The shell handles the FS
environment variable
specially.)
$ touch a.txt
$ touch b.txt
$ tar cf foo.tar a.txt b.txt
$ FS=$(fs-tar foo.tar) ls /
a.txt
b.txt
This time we created a tar file containing files a.txt
and b.txt
. We then
invoked ls
, telling the shell to pass to the ls
program the handle to a
filesystem returned by fs-tar
. The hypothetical fs-tar
program takes the
path to a tar
file and yields a filesystem providing read-only access to the
files in that tar file. Thus, the invoked ls
thinks that the root of the
filesystem contains a.txt
and b.txt
.
$ FS=$(fs-filter /bin /etc) ls /
bin
etc
This time the filesystem is passed from a hypothetical fs-filter
program,
which takes the filesystem it is called with and proxies that filesystem so
that only subsets of the tree are revealed.
$ FS=$(FS=$(fs-tar foo.tar) fs-filter /a.txt) ls /
a.txt
This time, we passed the filesystem from fs-filter
, but overrode the
filesystem passed to that progam in turn, so that the filesystem fs-filter
saw was the filesystem revealed by fs-tar
. This shows that filesystems can be
composed, purely for the invocation of a specific program.
Of course, as per Rethinking files, the filesystem needn't be
restricted to just files. Instead, it can serve as the principal namespace for
named resources. Network access, for example, can be provided via this
namespace, which means that fs-filter
can actually be used to restrict
network access. What the above idea effectively creates in many ways is what
you really want from containers — without the need for a container runtime.
Most importantly of all, execution of “containers” works just like invoking a
normal process. There are no daemons, no central repositories of container
images.
The underlying idea here is that, somewhat akin to a functional programming environment, the only resources passed to a process that you execute should be those resources that are explicitly passed (or at least, which you configure your shell to automatically pass).
Recall that in Rethinking files, I proposed that resource handles could be passed as command line arguments. The common stdin/stdout/stderr can also be made explicit:
$ FS= ls
error: no filesystem
$ FS= STDIN= STDOUT= STDERR= ls
$
Here we execute ls
with no filesystem, no stdin handle, no stdout handle and
no stderr handle! Because it doesn't even have a stderr handle, it can't output
anything, although it probably tried to output the message complaining that
it's been given no filesystem.
Note that this is distinct from the following in a conventional shell:
$ ls </dev/null &>/dev/null
In this case, stdin/stdout/stderr handles are still passed, they're just connected to /dev/null. In the above case however, the process receives no handles at all — not to a filesystem, nor to stdin/stdout/stderr.
Of course, we can take this approach further. By bringing in the idea that when
we execute a process, we should get to declare what resources it inherits, we
make sandboxing untrusted processes very easy. For example, a shell could use
a pseudo-environment variable PRIV
to communicate desired privileges in more
beginner-friendly terms:
# Execute untrusted binary which you only trust to use stdio (no filesystem access)
$ PRIV=stdio ./sketchy_binary_from_the_internet