How secure boot and trusted boot can be owner-controlled

Background. In recent years, the computing and embedded systems industry has adopted a habit of implementing “secure boot” functionality. More often than not, this is implemented by allowing a hash of a cryptographic public key to be fused into eFuses in a main CPU or SoC. When such a design is adopted, the private keys are invariably held by a device vendor and not by the device owner¹, which means that only the device vendor can provide updated firmware for the device. This is problematic for a number of reasons:

Firstly, because this negates owner control over the device and means that a vendor continues to excercise control over the device after its sale. Aside from being objectionable in and as of itself, this also undermines the right to repair. Moreover, it effectively constitutes the circumvention of the First Sale Doctrine via technological means;
Secondly, because keyfusing is highly inflexible and only allows a key to be set once, there is no provision for key replacement or rollover, or, at best, such provisions can only be excercised a small number of times. This means that normal best practices around key rotation cannot be followed, and catastrophic key compromise may be unrecoverable without replacing all devices recognising that key.

Discerning a threat model. The actual value of such secure boot functionality appears vague. Specifically, what is the threat model? Only two present themselves:²

OS compromise (e.g. via a zero-day) cannot remain resident after a reboot (that is, security against remote attacks).
Security against physical attack.

The problem with (2) is that (2) is fundamentally unattainable because in the worst case, an attacker could simply remove the fused chip and replace it with an unfused chip. Thus any attempt to mitigate against threat (2), or claim effective protection against (2), is a sham. In the author's view, the mitigation of supply chain attacks is an essentially insurmountable problem at this time, at least in terms of technical mitigations; only organisational/procedural control mitigations, such as accounting for chain of custody, seem viable. TLDR: You can't trust (for example) an IP phone which you can't account for the uninterrupted custody of, and any vendor which tells you otherwise is lying. Physical bugging devices could be inserted without even changing any of the firmware, after all. There is no silver bullet solution to supply chain attacks.

(1) is arguably more compelling, though it does feel like it often misses the point. For example, it does nothing for the security of the device configuration, as opposed to its firmware. For most devices, illicit changes to configuration can be just as dangerous as changes to firmware. For an IP phone for example, what is the point in securing against changes to firmware, if an attacker obtaining non-persistent execution on a device can simply change the device's configuration to send all its calls to a server the attacker controls? It is deeply amusing how often “secure boot” is added as a check-box feature, with barely any attention paid to the equally important question of secure configuration. How many IP phones will willingly get their configuration from DHCP servers and unencrypted HTTP connections? To my knowledge, the answer to this question is “all of them”.

Implementing owner-controlled secure boot. Moreover, it should also be noted that you don't actually need to use keyfusing to implement (1). For example, the “secure boot” functionality on x86 PCs allows users to change their own trust roots at any time. The way this is implemented is by having a region of a nonvolatile storage device reserved for boot firmware and trust configuration, which can be locked against mutation after boot. The only way to make this region writeable again is by resetting the system, restoring execution to said boot firmware.³ Thus, absent physical intervention, any mutation to the boot firmware or configuration must be approved by said boot firmware.

Although most SoC vendors design their SoCs to support keyfusing as their officially supported means of “secure boot”, it is actually possible to implement this owner-controlled secure boot design on most SoCs via only a small amount of additional board components. This takes advantage of the fact that

SoC-class devices almost never have onboard flash, and instead boot from an external flash device;
external flash devices usually have a “Write Protect” pin; and
many classes of flash device allow the “Write Protect” pin to be configured to write-protect some, but not all, of the device's memory.

The implementation looks like this:

The write protect pin of the flash is connected to the output of a simple fixed-function flip-flop chip.
A GPIO output of the SoC is connected to the set pin of the flip flop, allowing it to assert the write protect pin of the flash.
The clear pin of the flip flip is not directly connected to the SoC.
When the SoC boots, it boots from a specific region of the flash. This region is configured as covered by the write-protect bit and contains the first-stage bootloader.
The first-stage bootloader does whatever it likes, and just before it jumps to the next stage bootloader, sets the GPIO output to latch the write protect bit. Thus, the subsequent bootloader cannot modify the first-stage bootloader in flash. Other regions of flash remain mutable.
An additional GPIO output is wired to both the SoC's own reset pin and the clear pin of the flip-flop (probably with a few other discrete parts to ensure that any assertion of the output is sustained for a minimum amount of time, to meet any reset timing requirements of the SoC and to ensure that the flip-flop is cleared if and only if the SoC is reset). Thus, the only way the bootloader region of flash can be modified after boot, is by having the SoC reset itself, and return execution to the first-stage bootloader. Thus the first-stage bootloader is in full control of what modifications to itself it chooses to make.

Note that the flip-flop must also be reset to a valid state by some means when the board is powered up.

For example, a first-stage bootloader could choose to accept and apply signed updates to itself if presented during the boot process, before the Write Protect latch is set (but an owner could always restore control to a new key by manually flashing the flash via physical intervention); or a bootloader could display a prompt on a display connected to the SoC, showing the cryptographic identity of the new bootloader and asking if the user wishes to install it; or a bootloader could simply refuse to modify itself ever, rendering itself effectively immutable barring physical intervention.

In short, this creates an owner-controlled secure boot system which mitigates against threat (1) (but not threat (2), but all claims to mitigate against threat (2) are a sham anyway).

Implementing this dynamic in MCU-class chips is probably a lot less feasible, because most MCU-class chips are designed to boot from onboard flash, but do not provide lock bits or Write Protect-pin equivalent functionality to prevent regions of flash from being modified after boot. In general while MCU-class devices tend to offer all sorts of assorted security logic, they tend, like keyfusing, to be designed with specific modes of application in mind by the vendor and are not adaptable to novel use cases like the above, so finding MCUs which can offer high levels of security and defence in depth while also remaining owner controlled can be difficult.

However, there are some MCU-class devices without onboard flash which are designed to boot from external flash devices (some higher-end NXP MCUs, for example), which are thus of high interest to anyone seeking to implement the above functionality on MCU-class devices.

Implementing owner-controlled trusted boot. As mentioned above, it is absurdly common to see the industry promote “secure boot” functionality with orders of magnitude less attention paid to the issue of secure configuration. It can be observed that in large part the security objective of any design is to prevent malicious access to sensitive information. This suggests that the application of trusted boot, rather than secure boot, may be a substantially better approach for actually accomplishing useful security objectives.

The difference between secure boot and trusted boot is simple: In secure boot, the hardware authenticates the software it runs. In trusted boot, the software authenticates the hardware running it. Sort of — the reality is that both secure and trusted boot involve measuring a piece of code to be booted, but whereas secure boot uses that measurement to decide whether to boot it or not, trusted boot uses that measurement to decide what cryptographic secrets that code may be allowed to obtain.

Of course, these can be combined and do in fact go well together. Yet in the embedded space there has been comparatively little interest in trusted boot relative to the interest in secure boot, despite it actually being more flexible and arguably more suited to protecting secure information. However, substantially all trusted computing implementations in the wild appear to me to be seriously flawed:

All trusted boot implementations that I am aware of in the x86 space rely on secure boot for their security invariants (signed Intel ME firmware blobs, etc.); thus if these keyfused secure boot mechanisms are broken, so is trusted boot.
Trusted boot implementations involving a discrete TPM (or MCU-ecosystem components serving similar roles, like the ATECC series of chips) are vulnerable to physical interception of the signals between the discrete chip and the host processor (see TPM reset attacks, etc.). Really, trusted boot needs to be implemented on the main SoC to be secure.

As I noted when discussing threat models above, attempts to mitigate against physical attacks are futile — in the context of secure boot, because in the worst case an attacker can simply replace a keyfused chip on the PCB with a fresh one. However this criticism does not apply to trusted boot, because trusted boot does not seek to prevent unauthorized code from running; it simply seeks to ensure that particular data, such as certain cryptographic secrets, are only available to specific code, based on the identity of the code running on the device.

Trusted boot potentially can provide some security from physical attackers. For example, if an attacker physically replaces the SoC with an unfused version, with trusted boot, the attacker would lose access to any secrets stored on the device; because all information secured via trusted computing is ultimately secured using a secret fused into the chip itself, throwing out the chip for a new one also throws out the very secret an attacker is trying to obtain.

It should be noted that this security is not absolute and a determined attacker almost certainly will be able to extract secrets from a device making use of trusted computing (voltage glitching, power analysis, side-channel attacks, etc.); however it does seem that some degree of security against casual physical attackers is gained here. (Note again that this simply provides security against the extraction of secrets, and once again one must consider whether other physical attacks — such as implanting bugging devices — are not equally undesirable, attacks for which no mitigation is possible, other than maintaining uninterrupted chain of custody. When implementing either secure boot or trusted boot, care must be taken to ensure that the level of security being provided against physical attacks doesn't render itself completely moot by the inability to mitigate other physical attacks of an equally or more damaging nature; otherwise, it is just security theatre.)

How can trusted boot be implemented? Consider that keyfusing-based “secure boot” is usually implemented on SoCs by allowing the hash of a cryptographic public key to be fused into in on-die eFuses, which is then read and verified against code in flash by the chip's mask ROM during boot. In other words, the immutability of an on-die mask ROM is used to enforce the desired security properties. Thus, it's not hard to conceive of a similar design for trusted boot:

A secret key (the “root key”) is fused into the SoC. A lock bit, which can be cleared only by resetting the SoC, prevents the secret key from being read. A PUF could also be used.
At boot time, the mask ROM takes a cryptographic hash of the bootloader stored in flash. It computes a key derivation function with the root key and the hash of the bootloader as its inputs. The resulting key is passed as an argument to the bootloader when it jumps to it. Just before it jumps to the bootloader, it sets the root key read lock bit.
Thus, you can change the bootloader in flash at any time, but it will receive a different derived key as an argument when it is booted. This derived key can be used as a cryptographic secret by the bootloader for any purpose it likes. For example, it could use it to encrypt and authenticate configuration data stored in flash. If the bootloader is changed at all, this data becomes inaccessible because the key passed to the bootloader will change.

Note that while on the surface of it this seems like it commits you to never changing a single bit of the bootloader, ever, it is actually more flexible than it seems, because the bootloader can choose whether to pass its key to another bootloader. For example, if the bootloader stores an encrypted configuration file, it could facilitate an update to itself in the following way:

Cryptographically authenticate the new bootloader. (Since the bootloader is in flash and not the mask ROM of a vendor's SoC, it's completely flexible what kind of validation logic you use.)
Decrypt the encrypted data using the bootloader's key. Erase the bootloader's key from memory.
Reset the SoC, leaving a flag to the mask ROM in SRAM, requesting the mask ROM to boot the new bootloader in SRAM, rather than booting from flash. The usual hashing and key derivation is still done on the booted image.
The new bootloader locates the decrypted configuration in SRAM and has also received its new, different bootloader key from the mask ROM, because it is a new bootloader. It encrypts the decrypted configuration with its new key and stores it in flash, and also copies itself to flash.

In other words, a bootloader authenticated via trusted boot can choose the parameters and terms of its own successorship, and choose to yield control of data to other code, after having authenticated that code on its own terms. Alternatively, if the loss of all information secured via trusted computing is acceptable, the bootloader can simply be changed by force. Thus the owner of, for example, a second hand device, can always take control of the device by “wiping it down” in this way and installing a new bootloader of their choice, in the process losing access to any secrets stored on it by the previous device owner.

Other enhancements are possible; most obviously, the use of a captive-key cryptographic block on a SoC with write-only key registers, which are commonly found in many popular SoCs, to store the derived bootloader key, rather than simply passing a bootloader its key in memory, which may be desirable in some circumstances. For example, this prevents key leakage in the event of remote exploitation; a malicious actor compromising a device can only make use of a key for as long as they control the device, but not escape with the key. (I like to think of these hardware units as “cryptographic glove boxes”.)

Sadly, I'm unaware of any SoC which implements a logic like this in its boot ROM. Even where trusted computing functionality is implemented, it tends to be dependent on the security of a (keyfusing-based) secure boot process, rather than being orthogonal to it from a security perspective.

Short of some SoC vendor deciding to adopt this model, this leaves a few options for implementation:

Use a SoC vendor's keyfusing secure boot options (or, for MCUs, permanent readout protection) to implement a “virtual mask ROM”. Specifically, generate a private key, sign a bootloader with it, fuse the public key into the SoC, and then throw away the private key. This prevents anyone from changing the bootloader, ever, essentially faking an immutable mask ROM. This is also applicable to FPGAs which support bitstream encryption (e.g. ECP5) or FPGAs which support one-time programming of a bitstream (e.g. iCE40).

This does require the ability to store some sort of secret key material in fuses on the SoC. Most SoCs offering keyfusing for secure boot should be able to do this.

Pros:
- This provides security against remote attacks, and against casual physical attackers (that is, attackers who don't know how to do power glitching, exploit physical side channels, or decap chips).
Cons:
- Since the bootloader can never be changed, it must be highly verified as being bug-free, though this is not terribly insurmountable; SoC vendors do it for their mask ROMs every day.
- If a third party, like a board manufacturer, wanted to ship this as an official solution, it would require end-users to trust that they really had deleted the private key.
  
  (Alternatively the device itself could sign the bootloader and fuse itself on first boot, perhaps after user prompting. However this means that the signed bootloader is unique to the device, and the device is bricked if you lose it.)
Use of the owner-controlled secure boot approach described above, with a discrete flash chip and use of write protect bits. The write-protected bootloader implements the trusted computing logic.

This does require the ability to store some sort of secret key material in fuses on the SoC and disable access to it with a lock bit until reset. Many SoCs can do this.

Pros:
- Can be implemented on practically any SoC-class devices with only board changes.
Cons:
- This provides security against remote attacks only, and no protection against physical attacks.
  
  Arguably this is still a good solution in many circumstances, namely in circumstances where the most severe consequences of physical attack (e.g. planting a bugging device in an IP phone) cannot be mitigated by technological means anyway; meaning that technological mitigations must focus on mitigating remote attacks, and mitigation of physical attacks must focus on procedural/organisational chain-of-custody controls.
  
  On the other hand, this is not a suitable solution if, for example, the object of the application of trusted computing is to (attempt to) secure data (to a degree) against device loss.

1. Theoretically, if a vendor avoided doing the fusing themselves, the end user could take control of the fusing process and do it themselves, but since this requires a user to create and hold their own signing keys and sign all firmware updates with them, and because there is no way to subsequently change keys in the event of loss or compromise, this is utterly impractical for all but the most sophisticated device owners. ⏎

2. Actually, there is a third threat model: Protecting the device *against the device owner.* An alarmingly large number of modern products which are “sold” to consumers consider this to be part of their threat model. See [(1)](https://boingboing.net/2012/01/10/lockdown.html), [(2)](https://boingboing.net/2012/08/23/civilwar.html) for further discussion. ⏎

3. Actually, nowadays, modern Intel/AMD x86 systems do tend to allow the protected regions of the boot flash to be updated after boot, namely via UEFI calls which trap into System Management Mode. These platforms have a flash controller designed to allow greater access to flash if the system is running in System Management Mode. ⏎