Mitigating the Hetzner/Linode XMPP.ru MitM interception incident

(If you just want some recommendations for what to do, skip down to the Recommendations section below.)

Today, the operator of jabber.ru and xmpp.ru reported that their service had been successfully subject to a man-in-the-middle attack via a combination of

  • their hosting providers, Hetzner and Linode, intercepting traffic to their machines; and
  • the unauthorized issuance of a Domain Validation certificate for their service by an attacker.

It seems likely that this attack was orchestrated by the state of Germany (or Germany acting in concert with one or more other nation states). There are other possibilities; for example, both Hetzner and Linode might have decided to voluntarily comply with a wiretapping request from a foreign power that was not binding upon them, but this would reflect extremely badly on them, might well be illegal, and seems unlikely.

Detection. This attack could have been mitigated. It could also (potentially) have been detected:

  • The first way to detect this attack would be for the operators of the service to monitor Certificate Transparency logs to detect the issuance of certificates they did not request. There are some services which can do this for you, but we could probably still stand to have better tools here (e.g. tools which are good at notifying you only of certificates you didn't request).

  • The second way to detect this attack would be to periodically connect to the service and check that the public key used by the TLS server matches that expected.

Both of these detection methods have some issues and potential gaps:

  • CT is optional. A certificate issued by a legitimate CA isn't necessarily logged to a CT log. Surprisingly, CT logging is still not a requirement of the CA/Browser Forum Baseline Requirements (which set the rules all CAs must follow).

    What forces CAs to log certificates in CT is that web browsers now reject certificates unless they contain a cryptographic proof that they were logged to a CT log. Some CAs will sell you a certificate that isn't logged to CT (e.g. for “privacy” reasons) if you request one. Browsers may reject this, but there are many other kinds of client application (most of them, in fact) which don't check for or require a certificate to contain a proof of having been logged. So in this hypothetical scenario, the adversary could have procured an unlogged certificate.

  • Selective MitM. Trying to detect the MitM by probing the service could be worked around by detecting which connections are probing connections and not MitM'ing them. At a minimum it would be necessary to do something like perform the probe through Tor to prevent it from being trivially identified; however, this probably isn't perfect either. TLS stacks can be easily fingerprinted, as things like the order that TLVs are listed in give telltale signs of which TLS implementation is being used; it's quite likely that a service could have some level of success distinguishing between connections made by a real XMPP client and a probe agent. There's a million signals that could potentially give away that a connection isn't a “real” connection. An adversary could also just target specific persons it knows it is interested in intercepting (i.e., only MitM traffic on a whitelist, rather than exempting traffic on a blacklist of known probes). This therefore probably can't be considered too reliable either.

So neither of these detection methods seem particularly perfect.

Mitigation. The second area of consideration is mitigation, in which the unauthorized issuance of TLS certificates is prevented from happening in the first place. The entire point of a TLS certificate is, of course, to prevent a man-in-the-middle attack. The fundamental problem here is that the “Domain Validation” model by which CAs validate control of a domain name is ironically itself vulnerable to man-in-the-middle attacks, especially if an attacker can intercept not just some but all traffic to a victim site (as happened in this case).

Some years ago I authored ACME-CAA (RFC 8657), now implemented by Let's Encrypt, which can mitigate this in some circumstances. The basic idea is that you can configure a DNS record which specifies that only a specific account of a specific CA is authorised to issue certificates for a domain. Thus simply using the same CA isn't enough; you must gain access to the same account at the CA. With Let's Encrypt, this means gaining access to the ACME private key used to request a certificate. Based on what we know about the attack, it would have been prevented by deploying this extension.

There are a fair number of caveats here, which are explained in full in the RFC and my deployment advice (recommended reading). The RFC is a lot more readable than most, so flipping through it is highly recommended for those interested in deploying ACME-CAA. Some caveats are as follows:

  • You do need to deploy DNSSEC for this to work, otherwise the DNS requests made by a CA can simply be intercepted.

    Anyone who can get control of your DNSSEC signing key can also overcome this hurdle. So for example, a nation-state might simply serve a wiretap order to a hosting company like Hetzner or Linode, and similarly order your DNS service provider.

    It should be noted that it is possible to run a DNSSEC-secured DNS zone without giving your signing keys to anybody else; in this case the DNS hosting provider has no power to compromise the zone, so this seems like the best deployment strategy.

  • An adversary might be able to successfully compel your domain name registrar, or the TLD registry, to change the DNSSEC signing keys registered for a domain. This at least has the potential to be a “noisy” operation, and due to the nature of DNS caching, it may be hard for an adversary to prevent a recurring probe from detecting the change of key (unlike the selective MitM of a TLS connection discussed under Detection above).

  • An adversary might be able to successfully compel your CA to mis-issue a certificate.

  • You remain vulnerable to third party CAs which screw up or break the rules. The CA/Browser Forum Baseline Requirements now require DNSSEC to be checked by CAs, but a third party CA might mess up and issue a certificate anyway even if it's not listed by a domain's CAA record as authorised to issue certificates for the domain. Because logging to CT logs isn't a requirement, such certificates may never even be detected.

This is not an exhaustive list of cavets and you should refer to the RFC for the full details. Nonetheless, deploying ACME-CAA can offer a real level of mitigation here. It increases the number of hurdles for an attacker, especially when you spread different services around different jurisdictions. The game here is jurisdictional arbitrage and utilising the relative difficulty of international cooperation between adversarial powers. For example, if we assume that this incident is the product of coercion on the part of the German state, it doesn't necessarily follow that this adversary would be able to also coerce Let's Encrypt, for instance. Increasing the cost of attacks and the risk of them being detected also discourages nation-state adversaries, particularly as they are often loathe to have attention drawn to their espionage activities.

What would a perfect attacker do? While the core aspects of this attack may have been readily mitigated with technologies which were available but undeployed, it also has highlighted some serious gaps in the TLS infrastructure as it is deployed today.

In this particular incident, the adversary was slapdash, and let their illicit certificate expire. In this regard, they are less than a perfect adversary; but we should expect attacks such as these to get better, not worse, and to become more frequent, as nation-states become more frustrated by the presence of cheap and easy encryption.

As such, it's useful to consider what a more competent nation-state adversary (whom I'll name Mallory for our purposes) would do. Here, I'll assume that Mallory can do anything other than actually compromise the victim machine itself or its operator (which is not a good assumption, but bear with it):

  1. Mallory would take advantage of the fact that CAs aren't required to log certificates to Certificate Transparency logs, and request an unlogged certificate from a CA.

  2. Mallory would compel the hosting provider to MitM all traffic going to the victim machine.

  3. Mallory would use TLS stack fingerprinting and source IPs to heuristically identify traffic likely to be of interest and exclude traffic likely made to probe if the victim service has been compromised.

  4. Mallory would use the MitM to trick the CA into thinking Mallory is the legitimate controller of the victim domain.

  5. If the domain uses ACME-CAA with DNSSEC, this attack is foiled, so Mallory would attempt to compel the DNS hosting provider (if it holds the DNSSEC signing key, which it may not). If Mallory fails to do so, Mallory might try and coerce the domain registrar or TLD registry, but might have difficulty preventing this from being detected (if anybody is monitoring it, which usually isn't the case).

  6. Mallory also might attempt to coerce the CA itself.

Holes in the TLS infrastructure. This hypothetical attack illustrates the following holes in the present state of the public TLS infrastructure, ordered in descending order of severity (in my opinion):

  • Lack of CT logging enforcement by non-web TLS clients. Point (1) here is interesting, even if the actual attacker here did not take advantage of it. Web browsers now require that CA certificates include cryptographic proof of having been logged to a Certificate Transparency log, so while you can legitimately get an unlogged certificate from a CA, it's not so useful on the public web.

    However, it turns out there is a degree of inconsistency here. While web browsers enforce this requirement, most other TLS clients don't, and will accept an unlogged certificate. This probably includes most XMPP clients, as this CT validation is something that needs to be specially implemented. It is not something that is enabled “automatically” just by linking your application against OpenSSL.

    This means that a large amount of internet infrastructure which supposedly benefits from the security provided by contemporary TLS, actually receives a lower standard of protection than that of a web browser. Such infrastructure is easier to exploit because it will accept an unlogged TLS certificate, which CAs are legitimately allowed to issue.

    This seems to me a highly undesirable state of affairs, and thought should be given to how CT proof enforcement can be enabled by default in the future. Software which is liable to be used for highly sensitive communications (such as XMPP clients) should also consider reviewing how they are currently handling this issue and consider adding support for CT enforcement.

  • Lack of requirement to log certificates. The CA/Browser Forum Baseline Requirements, which is the set of industry rules CAs are required to follow (lest they be given the CA death penalty), does not require certificates to be logged to CT logs. This is arguably a surprising omission and I would argue this should be required.

    Of course, in reality, this omission is surely not accidental; there are too many companies which think hiding a hostname from a CT log is a meaningful form of security by obscurity, and I assume it's the advocacy of these organisations that has kept CT from being mandatory. But it is certainly something I would advocate for, and I would hope the CA/Browser Forum reconsiders this in the future.

    In any case, this issue can be rendered moot by universal deployment of the technical enforcement of CT logging, as per the above paragraph.

  • Need for more CT monitoring services. This is not really a gap in the TLS infrastructure as such, as anyone could create such a service, but the public would benefit from more CT monitoring services. The current major service available for CT alerting is SSLMate's Cert Spotter, but its pricing probably puts it out of reach for many services, especially those operated on a voluntary basis.

    It should be emphasised that any CT monitoring solution needs to think through how to avoid false alarms and only alert when observing unusual or seemingly unauthorized certificate issuance.

  • Lack of DNSSEC transparency. Although the idea has been mulled in the past, there is currently no deployed cryptographic infrastructure for detecting changes to the DNSSEC keys configured for a domain. This renders domain name registrars and TLD registries vulnerable to coercion or compromise to change the DNSSEC keys registered for a domain, which allows the protection which ACME-CAA can offer to be undermined.

    It would be highly desirable to see a transparency solution for DNSSEC zone keys ensuring that all changes are publicly visible. This transparency solution would only need to log changes to a zone's keys (DS records), not all records in a zone. (The latter is also possible but I wouldn't consider it essential, and organisations who have grown accustomed to the contents of DNS zones being un-enumerable would inevitably complain.)

Recommendations

With all this in mind, here are some recommendations for various parties:

I operate a service which I am concerned may be targeted by a nation-state. What should I do?

The following are ordered in ascending order of effort required compared to gain offered — in other words, do the first things first and work your way down until you reach the item you decide is beyond your budget.

  • Deploy ACME-CAA. Deploying ACME-CAA is likely to make it harder for attackers to trick CAs into issuing a certificate for your domain. For guidance for those unfamiliar with this area, read my blogpost about deploying ACME-CAA. I also recommend reading the Security Considerations section of the RFC, which is more readable than usual and contains a lot of good advice and important caveats.

  • If deploying ACME-CAA, also deploy DNSSEC. ACME-CAA is not maximally effective without use of DNSSEC. If possible, choose a DNSSEC provider that does not require you to give them your signing keys.

  • Don't use Cloudflare or similar services. See my article here for an explanation on why. If you use a service like this, you're basically already MitMing yourself.

  • Sign up with a CT log monitoring service, or manually check logs regularly. SSLMate's Cert Spotter offers an automated monitoring service, for a price. Their service is open source, so you can also self-host, and it is reportedly not too much trouble to do so. crt.sh offers a search engine which allows you to manually search the CT logs.

  • Make use of jurisdictional arbitrage. For example, your domain registrar, TLD registry, DNS hosting service, CA and hosting provider might all be situated in different countries. Consider which nation-states and geopolitical blocs are least likely to cooperate with one another and choose jurisdictions accordingly. Play jurisdictions off against one another. You can never count on this, but it may help.

  • Provide a Tor hidden service. One potential solution to these kinds of issues is simply to sidestep TLS altogether by providing a Tor hidden service instead and encouraging users to use it. The advantage of a Tor hidden service is that the service's public key is the address, so there is no CA or other centralised infrastructure to be fooled or compromised (and the problem of most non-web clients not requiring CT proofs becomes moot). Encourage users who consider that they may be at high risk of being targeted by a nation-state to use the hidden service (of course, encouraging such users to use Tor is good advice in general). If the nature of your service is such that extreme risk is posed to your users in particular, consider only offering a hidden service. Remember that your users will need a trustworthy channel to obtain the correct onion address, and third parties might try to trick them into using the wrong one.

  • Setup automated probing of your service via Tor. This is not a panacea as discussed above, but it is some icing on the cake you can perform if you really want to, which will catch amateur mistakes (which, as we have seen from this incident, often occur in the real world). It's essential to use Tor to prevent ready identification as a probe (a nation-state adversary is probably interested in users connecting via Tor, so it is unlikely to want to skip this traffic unless it already has a specific non-Tor-using user in mind). Try to minimise the prospect of TLS fingerprinting by using the a TLS library commonly used by client software. Even better, use an actual client (e.g. an actual XMPP client if you are running an XMPP service). Note that even if this does work, it still only detects compromise after the fact, and by that time your users may already have been harmed by the compromise.

  • Consider other attacks. If you make your service too hard to MitM, adversaries are more likely to engage in other attack vectors, like compromising the machines themselves and stealing private keys or sensitive information. A nation-state attacker could compel a VM provider to perform a dump of your system's memory at the hypervisor level, which will be entirely undetectable to you and compromise all data held in that VM. A dedicated server could be compromised by physical access by an attacker. Don't focus irrationally on the mitigation of one attack vector at the cost of others. A good understanding of the MO of your likely adversaries is helpful here.

  • Assume the worst. Ultimately, in most cases, the best interest of your users likely requires you to advocate that they adopt end to end encryption and assume that you have been compromised despite your best efforts. Note that “end-to-end” encryption delivered as a web application is not a meaningful protection.

I am the CA/Browser Forum (or a member with a vote in it). What should I do?

Require all certificates to be logged to a CT log, or raise a proposal to that effect. I mean, really. Just do it already.

I am an application client software vendor. What should I do?
  • SCT enforcement. Add support for enforcing the presence of CT proofs (known as Signed Certificate Timestamps (SCTs)) in TLS certificates, and enable this support by default. Advertise this so that users know they benefit from this protection if they use your software. However, be advised that there is some volatility around enforcing CT proofs, as this comment highlights.

  • FOSS is essential. Make sure your software is open source so that it can be publicly audited.

  • Consider supply chain risks and your own vulnerability. It's unrelated to this attack, but you should also consider encouraging distribution channels you don't directly control (like Linux distribution's package repositories) to reduce the harm that may come in the event that you yourself are legally compelled to compromise your software. See my article on issues with web-based cryptography for some food for thought here.

I am an end user. What should I do?
  • Assume services are compromised. At the end of the day, even if the party running a service is genuinely well-intentioned, their service may be compromised despite their best efforts. This is just fundamental good advice. Don't trust a service if your life might depend on its not being compromised.

  • Use end-to-end encryption technologies (and check your friend's fingerprints). Use end-to-end encryption technologies. For XMPP, this means using OMEMO or OTR. Remember to always verify the fingerprint of the party you are communicating with. Don't use web-based software for end-to-end encryption (or assume it is compromised, at least).

  • Consider using a Tor hidden service if your service provider offers one. Enough said. Make sure you get the onion address from a trustworthy source.

I am a CA. What should I do?
  • ACME-CAA. If you don't already offer support for ACME-CAA (RFC 8657), please consider implementing it. Note that you do not have to be an ACME-based CA to implement it.

  • Require CT. While the CA/Browser Forum rules currently allow you to skip logging certificates to CT, for example if a customer requests it, consider leading by example and adopting a policy of always logging certificates to CT.

I am a standards author. What should I do?

Maybe we should revisit the possibility of DNSSEC transparency, at least for DS records. By all means contact me if you're interested in this.

I am worried about my hosting provider compromising my machine (not just MitM'ing it). Is “confidential computing” any good?

Not really. Every current technology purporting to offer “confidential computing” has a ton of issues (and I intend to blog in detail on this in the future). I mean, look — deploying it is probably better than doing nothing. But don't count on it to be secure — every single one of these “confidential computing” technologies (AMD, Intel, etc.) is dependent on a vendor golden key which can be used to backdoor the system. A nation-state can just compel them to turn over the key and these schemes fall apart. It used to be we laughed any security system dependent on a golden key or with a backdoor like this out of the room, yet here we are and serious people are taking “confidential computing” seriously.1

See also


1. Confidential computing would be really interesting if it did actually work, and I think there may be ways of creating a confidential computing scheme which isn't dependent on a golden key. But none of the current schemes qualify.