Client certificates aren't universally more secure

This is a bit of a follow up to Memoirs from the old web: The KEYGEN element, which touched on deployment issues around client certificates. While client certificates aren't popular for web browser use today due to their terrible UI, at least outside of organisations' internal networks, they perhaps enjoy a bit more popularity for non-web-browser usage, such as for securing traffic to cloud APIs.

On the face of it, the case for using client certificates to authenticate access to a cloud API seems clear: the authentication mechanism is built into your existing TLS stack, which has probably received a lot more scrutiny and auditing than any method you might care to invent yourself. By the time the TLS connection has been successfully opened, authentication is in fact already complete.

Moreover, because client certificate authentication is effected using public key cryptography, the cloud service doesn't need to possess the information needed to authenticate as a given client (the private key), only the corresponding public key. This means that if the cloud service experiences a catastrophic data leak of credential information, the attacker can't impersonate clients. By contrast, most alternative authentication solutions involve symmetric shared secrets, which both the cloud service and the user need to possess, making this attack more of a risk.

Let's leave aside the UI issues around the use of client certificates on websites and instead consider the usage of client certificates for a web-based API. There are then three significant issues around the deployment of client certificates:

Backend servers must trust TLS terminators/load balancers.
Privacy issues.
Use of ambient authority.

Backend trust. When a user uses a TLS client certificate to authenticate to a service, they prove ownership of the corresponding private key to the TLS server terminating that TLS connection.

If your service is small and run only on a single machine, there is no issue here. However, if, as is more likely, you have separate backend services handling requests after TLS has been removed, these backend services don't automatically know what client certificate has been used to send a request. This is generally solved by having the TLS terminator add a X-Client-Certificate header to the incoming request with information about the client certificate which is being used.

There are some basic hazards around this such as header injection which need to be prevented (the same issue arises with the more common use of a X-Real-IP header or similar so that backend servers know the original client IP), but these are readily mitigated. However, there is a more pressing concern: if your TLS terminator is compromised, it can impersonate any client.

When a backend server receives an HTTP request via your TLS terminator/load balancer front end, it has no choice but to blindly trust the X-Client-Certificate header, which after all does not provide any proof that a given client certificate was used, only a simple assertion. If a machine responsible for terminating TLS became compromised, this means that the attacker could impersonate any client certificate, and thus any user, without limitation.

Moreover, this is not easily fixed. TLS client certificate authentication is designed to prove control of the private key to the TLS server, but it is not designed to allow that control to be transitively proven to third parties. The design of TLS more or less precludes this.¹

Another issue is that while a compromised TLS terminator impersonate any client, so can any other entity on your internal network which can successfully connect to a backend server and make requests; therefore, it is essential that backend servers be configured so that only a TLS terminator can successfully connect. In other words, the use of client certificates implicitly creates the requirement for backend servers to actually authenticate the TLS terminator somehow. Since the object of the TLS terminator is to terminate TLS and using TLS from the TLS terminator to the backend server would seem to defeat the point, it seems like a given that this authentication mechanism will actually be far weaker than that offered by TLS, such as an IP whitelist.

Backend trust: comparison to alternatives. What is interesting is that for this reason, client certificates in practice seem to actually be much less secure than a typical contemporary request signature scheme used to authenticate requests to a cloud service. As an example, let's use the AWS “v4” signature scheme, which is used to authenticate API requests to AWS services.

The AWS v4 signature scheme uses symmetric, rather than asymmetric cryptography and is based around HMAC. The client computes an authentication value based on a shared secret known to both the client and AWS, and the request body, and places this authentication value in an HTTP header or query string field.

Since the details of a request are covered by a signature, a compromised load balancer cannot impersonate a legitimate user, other than (at most) by replaying a request exactly unmodified until the signature expires. It cannot forge requests or modify legitimate requests made by a user, though it can obviously compromise the confidentiality of those requests, which are only protected by TLS. This is in stark contrast to if authentication were based solely on client certificates.

Because the AWS signature scheme is based on symmetric cryptography, at least one internal server somewhere needs to know the correct shared secret in order to be able to verify the signature. Here is one aspect where admittedly the scheme is weaker than the use of client certificates — at least one server somewhere needs to be given the shared secret, which also means it could forge legitimate requests if it is compromised.

Application servers could be allowed to retrieve the shared secret directly and perform their own signature verification; alternatively, application servers could be required to pass a request signature to some internal central authentication service which holds the shared secrets for a “yes/no” decision. This avoids the need for an application server to ever see a shared secret itself and reduces the scope of what has access to secrets, but does create dependence on another centralised service.

However, the AWS v4 signature scheme has an interesting design in this regard, as it uses a hierarchical construction, in which signing uses a scope-restricted subkey derived from the master shared secret:

DateKey               = HMAC-SHA256("AWS4" + "<SecretAccessKey>", "<yyyymmdd>")
DateRegionKey         = HMAC-SHA256(DateKey, "<aws-region>")
DateRegionServiceKey  = HMAC-SHA256(DateRegionKey, "<aws-service>")
SigningKey            = HMAC-SHA256(DateRegionServiceKey, "aws4_request")

When an application server needs to authenticate an incoming request, it does not need to look up the actual shared secret which constitutes the AWS access key; instead, it can be given only a subkey which is specific to the current date, the applicable region, and the service in question. Most likely, AWS has an internal service consumed by user-facing applications which allows the application to retrieve the correct subkey for itself, but which does not allow an application to retrieve subkeys for other services.²

Thus:

if a load balancer is compromised, it cannot forge or modify requests;
if an application server is compromised, it can only forge requests for the service it is part of, not for any other service which an access key might be used to access (but even this is rather moot since if an application server is compromised, obviously that application is compromised anyway, being that it was the entity responsible for verifying such a signature anyway, so there is essentially no gain here for an attacker).

This does not totally eliminate the advantage of client certificates not requiring the service provider to store a shared secret; a compromise of the master credential database would still lead to the ability to forge requests for any user. However, access to this database can be restricted via an internal centralised service, with only subkeys being handed out to user-facing services. The master shared secret for an API key thus can only be compromised by compromising the internal authentication service itself, which can be hardened against such prospects in much the same way that credit card storage “motels”, or HSMs used by CAs to issue certificates, are.

The risk of load balancer compromise seems much higher than the risk of compromise of such a service, being that a load balancer is a public-facing service exposed directly to the internet. So on the whole, it seems to me for a typical modern cloud service, MAC-signed requests seem to be a much better security tradeoff for most applications than client certificates, as the blast radius of a compromise can much more limited if a well-designed authentication scheme is used.

There is one benefit of client certificates, which is that their use makes corporate MitM setups using custom root CAs impossible, as the corporate MitM can't successfully feign ownership of the client certificate to the service provider. Thus client certificates do have the potential from the perspective of the service provider to actually provide a greater guarantee of end-to-end confidentiality for a TLS connection. (It is interesting to note that this is a benefit to confidentiality rather than to authentication, despite being the result of employing what is nominally an authentication mechanism.)

Thus, if confidentiality is considered a higher priority than preventing forgery of legitimate requests, client certificates might be considered preferable; whereas if preventing forgery of legitimate requests is considered a higher priority than confidentiality, non-client-certificate based authentication schemes are probably preferable for the reasons given above. Note that in both cases, confidentiality depends absolutely on one's TLS terminator not becoming compromised; the only added benefit to confidentiality in adopting client certificates is for service providers who specifically want to prevent corporate MitM.

Of course, in actuality, nothing prevents one from combining both schemes, and obtaining the best of both worlds, and some defence in depth, at the cost of a degree of increased complexity. This is probably the best solution; though the client certificate component of a solution isn't really buying that much more than a HTTP-level request signature scheme, mainly just corporate MitM mitigation. (If the set of users allowed to access the service is relatively restricted, it may also have defence-in-depth advantages — namely, that an attacker can't exploit any potential vulnerabilities in request signature processing or other vulnerabilities in a service's API unless they have a valid client certificate.)

Privacy issues. There are also privacy issues with the use of client certificates, as client certificates aren't encrypted on the wire prior to TLS 1.3.³ This means that the contents of a client certificate will be sent in the clear over the internet if a client ever uses TLS 1.2 or earlier to connect. At a minimum this leaks a unique identifier for every client, and possibly more information, if the client certificates used contain meaningful textual information, like the name of the user. This issue is so serious it seems to me that client certificates can't seriously be recommended unless you are willing to mandate use of TLS 1.3.⁴

Ambient authority. A third issue with client certificates relates to the dangers caused by ambient authority. When a client connects to an HTTPS server using a client certificate, the server might consider all requests to that server to be automatically authenticated as being made by the user to whom that client certificate is assigned. This is an example of ambient authority, the security hazards of which are well documented and which can lead to the confused deputy problem.

This issue is readily rectified, however, so it's not really an intrinsic issue with the use of client certificates — one need merely require that a client also specify a username or similar principal identifier in each request, for example using an HTTP Authorization header or similar. (Of course, this also has the possible advantage of allowing the same client certificate to be used for more than one principal, though it is perhaps doubtful if this is really advisable from a security standpoint.)

1. It may be possible to construct some custom protocol based on parts of TLS which would sort of allow this. During client certificate authentication, a client obviously has to prove control over a client certificate's private key by signing a challenge value. This signature could be shown to internal third parties. However, this solution already starts to look horribly complex and would involve custom hacking on TLS stacks, mitigating the turn-key benefit of using client certificates. Moreover, the scheme is still not really secure as the question arises as to how to prevent the reuse of the signature in perpetuity; an attacker compromising a TLS server could just wait for the legitimate client to connect and then reuse the signature forever. Things are already complex enough at this point in the contemplation of this hypothetical possibility, so I will stop here — it seems like a dead end to me. ⏎

2. Of course, nothing stops a client from also taking advantage of this subkey structure. You could for example keep an AWS access key on an air-gapped machine, and when needed export a subkey from it valid only for a given date, or a given date and region, or a given date, region and service. This might be useful for very rarely needed and potentially dangerous operations. However, I don't suggest this in practice as it is far more practical simply to create an IAM user or role restricted to the said scopes with appropriate policies; so in practice, this is not that useful for clients. Moreover since basically no existing AWS API library to my knowledge supports using these subkeys directly, you would have to modify such a library or write your own to do this. The design of this scheme was almost certainly motivated by restricting blast radius on the server end, not on the client end. ⏎

3. Theoretically, it is possible for TLS 1.2 servers to use TLS renegotiation to demand a client certificate from the client only after the TLS connection is established, in which case the client certificate is encrypted. Can you rely on your server to do this correctly? Probably not. Moreover, the hazards inherent in TLS's renegotiation functionality has led to a number of vulnerabilities, so encouraging further use of it doesn't seem like a great idea. Renegotiation was removed from TLS 1.3. ⏎

4. One protocol where client certificates have been commonly used in the past is EAP-TLS, sometimes used to authenticate to “enterprise” WPA-protected Wi-Fi networks. EAP-TLS uses TLS solely for its authentication properties and as such, the TLS connection is torn down without sending any application data immediately after the TLS handshake succeeds, as this implies that authentication has succeeded. However, the fact that client certificates were sent in the clear prior to TLS 1.3 made constructions such as PEAP-EAP-TLS common, in which a TLS connection is established and then *another* TLS channel is run over the top of it to authenticate via client certificate. ⏎