OK Google: bypass the authentication!

During a recent assessment of a voice application we found a very intriguing vulnerability that, besides being a lot of fun to exploit, demonstrates how the complexity of modern applications, built on top of several separate components and technologies, may allow chains of bad practices to lead to surprising results.

Our target application, built for Google Assistant, let a user “speak” to a device in order to log into his utilities’ provider account and retrieve some personal information (e.g.: payments statuses, active subscriptions…) as well as perform some dispositive actions.

TL:DR; with spoiler: A combination of the following, led to the discovery of a complete authentication bypass, triggered by pronouncing the Italian words “A capo” (“new line”/”return”):

  • Incorrect DialogFlow application flow design
  • Flawed authentication and authorization mechanisms between custom code in DialogFlow and the application backend
  • Inconsistent way of treating special characters between the Google input device and the DialogFlow backend


This test was carried out following the “Double Gray Box” approach, as described by the Open Source Security Testing Methodology Manual 3.0, meaning that we simulated a threat agent with limited knowledge of the target application. In particular, we knew that the application implemented a Google Conversational Action, and roughly how it worked.

Specifically, we knew the user needs any device with Google Assistant (Android phone, Google home, etc.) to “have a conversation” which is sent and elaborated by a DialogFlow Conversation Fullfillment. The latter is a service offered by Google in the form of a cloud component, in order to make it easy for developers to manage a voice conversation.

Put it very briefly, a developer can set up a series of “states” (or intents) and, based on the voice input received by the user, define the application flow (i.e. which intent to go next). Meanwhile, each intent can also elaborate the user input using custom code added by the developer, and communicate to a third service (usually fully controlled by the developer). It can obviously also communicate back to the user’s device.

Actions on Google parses a user utterance and sends a request to
            Dialogflow. Dialogflow matches the intent and extracts parameters to
            send to its corresponding Dialogflow fulfillment. The fulfillment
            then sends a response back to Actions on Google, which renders the
            response on an Assistant surface.
Conversation fulfillment when using Dialogflow (from Google’s documentation)

In our scenario, we had access to the developer’s JavaScript code that was used by DialogFlow to elaborate the user input received and communicate to our client’s backend systems. However, we did not have access to the DialogFlow dashboard and hence could not observe the application flow design.

Vulnerability discovery

Analyzing the JavaScript handlers, we immediately noticed that the application did not implement adequate authorization mechanisms: it contained an hardcoded API key used to retrieve an authenticated token to access the backend. The token was initially used in the login phase to communicate with the database and validate the user’s password. It turns out, this API key gave full authorized access to the backend APIs. Here we noticed the first problem: although the JavaScript code is invisible to the user (hosted on Google’s servers, accessible only by DialogFlow), following the principle of least privilege this token should be authorized only to validate users’ passwords. In fact, an unlikely vulnerability in Google, a more likely DialogFlow’s account takeover or, as it happened, a vulnerability in the application, could leverage the excessive authorization of this API key to gain access to critical data. A more robust mechanism would be to obtain an authorized token only after the password is found to be correct; the token would also be restricted to the logged in user.

When conducting a penetration test with (at least some) source code available, we like to combine static and dynamic analysis: so while evaluating a regular expression that was validating the login, I had my first interaction with the actual application (from an Android device). The application greeted me, and asked me for the username; I replied by pronouncing the user code provided by our client for the tests; the application then asked me the password, and, after a first attempt with the correct password, I instead tried a second time saying “a capo”, which is the Italian equivalent of “return”, “enter” or “new line”. To my surprise, the application considered it a valid password and just let me in, ready to provide all the available functionalities and information for that particular user. We obviously tried with other accounts we were allowed to use and confirmed that the magic word gave us full access to all functionalities and information for all users. Apriti sesamo!

Digging deeper

Now we had a cool vulnerability, but had no idea of the reason behind this strange behaviour. In order to understand it, our client provided us access to the DialogFlow application. From there, my colleague Federico noticed how after saying “a capo” the application, instead of going from the intent “Insert Password” to the intent “Wrong password”, triggered what could be called an exception, and went directly to the Default Intent, which was set as the intent providing the application menu. From this observation, we managed to reconstruct the flow and explain the strange behavior.

First of all, the initial “welcome” intent asked for the user id. If the user provided a valid one, the JavaScript code stored it inside the global “context”, available to all DialogFlow intents, as shown by the (simplified and anonymized) code below.

In the following step, the user was asked to insert his password. The DialogFlow intent was set to accept an input of type “@sys.any”. “@sys.any” is the most generic type possible, often used as a “catch all” to accept any kind of user input; from the system entities documentation, it matches any non-empty input.

However, when a user pronounces the words “a capo” (Italian equivalent of the English “new line”, “new paragraph”) the Google Assistant application translates this as the control character \n, interpreting the phrase as if the user had pressed the Enter key, hence submitting the input. Obviously, the keyboard that can be used as an alternative to the voice input does not let a user press the Enter key until there is an actual input to be sent. So here we find what we consider a bug in Google Assistant: an incorrect interpretation of a special character and an incoherent treatment of an empty input. More details on this bug disclosure and Google’s response at the bottom of this article. Please note that even if this bug were fixed the problem would remain, although exploitable with more difficulty: it would be possible to intercept HTTP requests made from the device to the DialogFlow backend, replacing any valid input with the character ‘\n’ (or any equivalent, such as ‘\r’).

So, to recap: in the first step we have set our user id in the application’s context; in the second step we have sent an empty input (specifically, character \n) as a password. When this input is sent to the DialogFlow intent, it triggers an exception, since the intent is expecting a non-empty input, while \n is considered as a lack of content.

The character \n triggers the Default Intent when an entity of type @sys.any is expected

At this point, the incorrect DialogFlow application design comes into play: the developers had set the “Main Menu” intent as the Default Intent, probably reasoning that no exceptions could be triggered before authentication. But this assumption proved to be wrong. From here, all API functionalities are called with the user id as the parameter and the authorized API token hardcoded inside the JavaScript code, as the (simplified and anonymized) code below shows.


So… saying “a capo” (“new line”) in place of the password, lets us bypass the authentication. How did we end up in this situation? Let’s try to briefly sum up the chain of issues that led us here:

  • The user enters any client id, which is stored inside the DialogFlow application context
  • The user says “a capo” (“new line”) which is interpreted as a control character by Google Assistant, which in turn sends an empty (‘\n’) content to DialogFlow
  • The intent receiving the password is set to expect an entity of type @sys.any, which matches any non-empty input. Our input, ‘\n’, is considered an empty content: this triggers an exception in the flow, falling into the Default Intent
  • The Default Intent is set to the Main application Menu, which provides access to all application functionalities
  • From here, the attacker can access any functionality for any user, thanks to the fact the the API token stored inside DialogFlow’s custom JavaScript’s code is authorized for all users, and that he can set any user id during the login phase.

Disclosure and Google response

After finding the vulnerability, we immediately alerted our client, providing both an immediate mitigation (setting the user id inside the context only after the correct password was inserted) as well as detailed remediation to all issues presented in this post.

We also alerted Google through their Vulnerability Reward Program, on November 12th 2019, of the incoherent treatment of special characters such as ‘\n’. They finally replied, on Nov 28th, saying that they did not consider the issue severe enough to be tracked as a security bug; they assigned it the status “Won’t Fix (Infeasible)”. They added it is the developers’ responsibility to write applications in a way that are able to deal with unexpected input. Ok.

But when I finally had some time to write this article, I wanted to attach a vulnerable demo application for everyone to test the issue. However, when creating the DialogFlow project, recreating the vulnerable “Insert password” intent, I was presented an alert I had never noticed before:

New DialogFlow alert

Furthermore, when pronouncing the words “a capo” in place of the password field, they are nomore interpreted as a control character by Google Assistant, instead they are literally interpreted as the words “a capo”, as the image below shows:

The words “a capo” are nomore treated as a control character

So it seems as Google has addressed the problem in both components (Dialog Flow and Google Assistant). I reached out to them in order to better understand these updates, but they said that “those changes are not made as a result of [my] report”.