Against risk-based authentication (or, why I wouldn't trust Google Cloud)
If you were deciding on whether to use AWS, Azure or Google Cloud for a production application, what factors would you use to make the decision? There are probably many such factors you might think of, but I'd like to point out one that probably isn't commonly considered: the relative trustworthiness of Google's accounts system compared to that of, say, AWS.
Google's accounts system controls access to all Google services, including Google Cloud. The problem with this is that the design of Google's accounts system gives me serious reservations trusting it for anything mission-critical.
The problem lies ultimately with the modern trend of “risk-based authentication”. However, a more descriptive term would be “non-deterministic login”.
The problem is precisely this: The credentials you require to access a Google account are essentially indeterminate. Supposedly, for a simple Google account without 2FA enabled, knowledge of the account email and password should be sufficient to access an account; except sometimes, they aren't. Sometimes, Google might randomly decide your login attempt is suspicious, and demand you complete some additional verification step.
This sounds potentially innocuous until you then realise that there's no guarantee you can actually complete this additional verification step. There are to my recollection numerous stories of people being locked out of accounts which they have the passwords for because Google has decided that things are suspicious and having the password is not enough.
The problem is not with requiring a high level of authentication; for example, enabling 2FA is commonly considered a best practice nowadays. If I enable 2FA, I'm committing to be able to complete that authentication challenge in the future and can make appropriate arrangements. The problem is that Google seems to reserve the right to randomly and unpredictably demand a higher level of authentication without any kind of prior opt-in. It's the unpredictability that's the problem here.
What makes this unpredictability particularly pernicious is that it creates the possibility of being able to access an account for a long period of time before suddenly having that access essentially destroyed without warning or recourse. If some additional authentication step X is required for every single login, you're going to make sure you can always complete that additional authentication step. If additional authentication step X is instead only demanded for logins which are deemed “suspicious” according to the inscrutable statistical determination of some ML model, there's a good chance you will have no idea that that additional authentication step X may be a necessary condition to accessing the account under some circumstances. Which means the first time you find out about this is when your infrastructure blows up, you have to login from a different machine than you usually do, and Google decides this is “suspicious” and demands an additional authentication step — and because this is the first time you find out about this, it may or may not be an authentication step you are capable of completing. If it isn't, you're screwed as far as Google Cloud is concerned. You can't access your infrastructure or begin to fix it.
Fundamentally, the issue here comes down to the fact that an accounts system for critical infrastructure needs to fulfill two objectives:
- It must be possible for authorized users to gain access.
- It must not be possible for unauthorized users to gain access.
“Risk-based” authentication essentially tries too hard to fulfil the second objective in a way that compromises on the former.
In particular, if access to an account can be mission-critical, it's desirable for all authentication steps needed to access it to be tested regularly. This suggests that if an accounts system designed for critical infrastructure has a repertoire of authentication steps A, B, C it can variously demand, it's better for it to demand A, B and C every time than to demand A and B in most cases, and also C if the login is deemed “suspicious” on undocumented statistical grounds. The inconvenience of always having to go through step C is nothing relative to the risk that if C isn't demanded most of the time, the account holder may not understand that C is sometimes necessary until it's too late, and lose access to the account.
In short, what I want from an infrastructure accounts system is determinism. I explicitly don't want any non-deterministic, “risk-based” or ML decision involved in a decision as to whether I can access an account which controls critical infrastructure, because refusing access when it should have been granted can be every bit as disastrous as granting access when it should have been denied. In other words, an accounts system for critical infrastructure:
- must grant access to the holder of the pre-arranged set of credentials (availability — possession of the credentials is a sufficient condition);
- must not grant access to anyone who doesn't hold all of those credentials (security — possession of the credentials is a necessary condition).
There's no room for statistics or guesstimation of “suspiciousness” here. The desirable property is total predictability. It's as essential that I be able to be 100% confident that I can access an account when I need to, even if it's in emergency circumstances that appear unusual to Google, as it is knowing that anyone else without the required credentials can't.
Document it? Could Google fix this by documenting the full set of authentication steps that may be required under the worst (most “suspicious”) circumstances? Maybe, in principle. In practice for the reasons I give above, if some additional authentication step X is requested only sometimes, some account holders will neglect to realise this until it's too late. But it would certainly be useful for the shrewd cloud administrator who reads all the documentation.
However in practice, this doesn't work for two reasons. Firstly, Google seems to change their attitudes to authentication over time, so just because they have a certain strategy regarding risk-based authentication today doesn't mean they will later. Moreover, if they do change it, they're probably not going to bother to tell you. Even if you do your research and determine that in the worst case, Google might demand authentication steps A, B and C to access an account in “suspicious” circumstances, this doesn't help you if they later revise their approach to authentication and start demanding step D for “suspicious” logins as well. In other words, the meta-problem here is not just the non-determinism of the login process but that Google also seems to see fit to revise the requirements for the “worst-case” login process over time without making any particular announcement. Thus the login process is both statistically and temporally non-deterministic; even if you do your research to determine what authentication steps are required for login in the worst case today (if it's even documented), you can't actually predict what Google might require as a verification step for login in the future after they rearrange their system.
So you can't really mitigate the risk of account lockout in this way. All you really know is that when accessing a Google account at some point in time in the future, you may be asked to complete some arbitrary and unforeseeable authentication step X, which you may or may not actually be able to complete, and where the nature of step X may also have been invented by Google in the future. So how can you be sure you'll be able to access a Google Cloud account hosting critical infrastructure in an emergency? In short, you can't.
Sorry, we just don't like you. Actually, as far as I can tell, it gets even worse. So far we've assumed that Google might spring some hidden extra authentication step on you at the worst possible time which you didn't know might sometimes be required. Except it seems like sometimes, Google will just refuse to let you authenticate at all. Not even “please complete this extra authentication step” (which you may or may not be able to actually complete), but an actual “nope, you're too suspicious, not letting you login, go away please”.
Actually, “suspicious” is not even the right word. It seems like Google now determines whether you are allowed to even try and login based on all sorts of factors, like whether it happens to like your web browser enough. If it doesn't, you're offered no recourse, and Google basically just tells you to get lost:
This isn't really OK under any circumstance, since any statistical decision can always be wrong and there needs to be some way (however arduous) to prove your authenticity in the face of a statistical determination of high risk, however arbitrarily high that assessment might be. But moreover, we're not talking about something like, “there have been 50 failed attempts to login from this IP in the last hour”. When does Google show the above notice? Apparently at least one circumstance is, when you're using an unapproved web browser. Users of niche web browsers on Linux have complained that the Google accounts system simply won't allow them to to login at all. That's right — it's 2023 and we're back to user agent sniffing, exemplified by no less than Google.
To put it simply, I'm not willing to trust critical infrastructure to an accounts system that under some indeterminate and inscrutable set of conditions, might simply refuse to let me log in at all.
Such a system might have good security but it severely lacks availability. For critical infrastructure, this is a disaster.
A consumer-grade accounts system. More generally there are some other strange things about the use of the Google accounts system for Google Cloud. Fundamentally, the accounts system is clearly consumer-oriented. A lot of this risk-based authentication seems based on the general premise that most users of the Google Accounts system can't really be trusted to set secure passwords and are their own worst enemies.
This attitude... might be true for consumer accounts, although it does lead to people being locked out of their Google accounts they've poured decades into due to, again, risk-based authentication and suddenly being asked to verify using some authentication step they can't complete. Horror stories of this nature in fact frequently arise on popular tech news sites and even have become something of a staple. It is moreover actively hazardous for an accounts system gating access to critical infrastructure.
I think a big part of the problem here is that Google is trying to force two very different domains to use the same accounts system. In fact, the use of the consumer-focused Google accounts system for Google Cloud comes across as weird in all sorts of ways. It's simply weird that in order to use Google Cloud, I have to get an account which can also be used to upvote YouTube videos. I question the logic of demanding that these separate systems use unified authentication, when the security requirements for these different domains are so different.
Probably the most surreal demonstration of this forced-unity approach however, is Google Cloud's usage of Google Groups.
If you're familiar with basically any access control system — whether it's that of UNIX, AWS IAM, or Windows AD — you're surely familiar with the idea of having users which can be placed into groups.
How do user groups work in Google Cloud? Well, you see... to create a group of users for access control purposes in Google Cloud, you have to create a Google Group.
Yes, that Google Groups. Yes, I'm talking about the thing that started out as a Usenet gateway and archive called “Dejanews” which got acquired by Google in 2001 and turned into Google Groups, and which is now a somewhat horrible mailing list system. If you want to create a security group in Google Cloud, you have to create it as... a Google Group.
It's like there's some bureaucrat at Google whose prime responsibility is for ensuring that Google doesn't accidentially create duplicate products (but who has a weird and inexplicable blind spot for chat products), and who upon hearing that Google Cloud needed a user group system, ran their finger down a list of Google products and said “we already have a groups product. Google Groups. Use that.”
I don't want my security groups system for critical cloud infrastructure to be inextricably intertwined with a Usenet archive and mailing list system. Just... what?
Comparison to AWS. The absurdity of all this is especially clear when you compare it to, say, AWS's IAM. In AWS, a user belongs specifically to an AWS account, and as such that AWS account gets to control authentication policies. The account can decide what credentials are required for access, and, at least as far as I can tell, AWS's login process for IAM users is completely deterministic. IAM users can be placed into IAM groups, which have nothing to do with Usenet.
This highlights another problem with Google's attempt to use their consumer accounts system for cloud infrastructure: individual user accounts don't “belong” to an organisation. Google Cloud's IAM has you grant access to existing Google Accounts as identified by their email addresses, whether directly or indirectly via a (yes) Google Group. However these accounts are completely normal consumer accounts. This is a fundamental difference here. Google can't give cloud users the ability to control what authentication steps should be required for an account (for example, to disable risk-based authentication) because those accounts don't “belong” to the organisation and can be used to access other non-organisation resources, thus every account must have the same rules.
Workspaces workaround. As far as I can tell there is one partial mitigation of some of the issues here, which is to use Google Workspaces, previously known as Google Apps. Of course, this product is oriented around Gmail, Google Docs, Sheets, etc. and competing with products like Office 365. However, a silver lining to this is that an administrator of a Google Workspace can create accounts which belong to that organisation. Moreover, because those accounts belong to that organisation, it appears that administrators of the workspace can to some extent define authentication policy.
This isn't a complete mitigation of the issues raised here, however. Fundamentally the true problem with the Google accounts system is its fickleness, both at any point in time, but also temporally as Google changes the system. Using Google Workspaces isn't sufficient to allay concerns that Google might arbitrarily decide to refuse to let me log in at all, whether due to using the wrong web browser or for who knows what other reason.
Conclusions. In my view, Google shouldn't attempt to use their consumer accounts system for cloud infrastructure. The requirements are simply too different and the use of things like Google Groups to model security groups borders on absurdity. There is a common attitude in IT, especially as regards authentication systems, that everything must be unified and there must be only one. There are two fallacies here:
First is the fallacy that there is one set of requirements which are suitable for all authentication systems, which clearly isn't true (consider how the optimal level of fraud is non-zero for some systems — but clearly not for others).
The second is is that unification of any two given namespaces of accounts is always desirable. This also isn't true because (amongst other reasons) if an account belongs to a namespace of restricted scope, the controller of that namespace can reasonably be given full control over those accounts and the authentication policy which is applied to them; whereas if accounts can only exist in a universal namespace, all accounts must be subject to the same rules (and then see the first fallacy).
Limiting the scope of services an account can address is also a perfectly normal security practice, of course. It's simply strange that to use Google Cloud, I have to grant permissions to accounts which can also be used to upvote YouTube videos.
Further reading. There are innumerate discussions online about people being locked out of Google accounts without recourse, often accounts which have literally been held for decades. This link lists just the instances discussed on HN alone. This article provides some further discussion.