What is User Impersonation?
Impersonation is simply the ability for one user to act on behalf of another user. For example, say user ‘yarn’ has the credentials to connect to a service, but user ‘foo’ does not. Therefore, user ‘foo’ would never be able to access the service. However, user ‘yarn’ can access the service and impersonate (act on behalf of) user ‘foo’, allowing access to user ‘foo’. Therefore, impersonation enables one user to access a service on behalf of another user.
Who are the users?
The impersonation feature defines how users can act on behalf of other users. Therefore, it is important to know who the users are. Hadoop applications (e.g. Spark, Hive, Presto, etc.) use the Hadoop client to interact with HDFS (or another HDFS-compatible file system, e.g. Alluxio). When using Alluxio with Hadoop applications, there will be two clients as part of the application; the Hadoop client, and the Alluxio client. When an application uses an Alluxio URI (e.g., alluxio://host:port/my/path/
), it will interact directly with the Hadoop client first, and internally, the Hadoop client will interact with the Alluxio client. Finally, it is the Alluxio client that is communicating directly with the Alluxio masters and workers.
A user, or identity, can be specified for both the Hadoop client and the Alluxio client. Since the user for the Hadoop client and Alluxio client can specified independently, this could mean the Hadoop client user is different from the Alluxio client user. The Hadoop client user can even be in a separate namespace as the Alluxio client user. For example, it is possible that within a single application, the Hadoop client is specified as user ‘foo’, but the Alluxio client user is specified as ‘yarn’.
What is Alluxio Client-side Hadoop Impersonation?
Alluxio client-side Hadoop impersonation aims to solve the confusion that arises when the Hadoop client user is different from the Alluxio client user. Since the Hadoop client user and the Alluxio client user could be different within the same application, Alluxio’s client-side Hadoop impersonation examines the Hadoop client user, and then attempts to impersonate as that Hadoop client user.
For example, say a Hadoop application is running so that the Hadoop client user is specified as ‘foo’, but the Alluxio client user is specified as ‘yarn’. Without client-side Hadoop impersonation, the Alluxio client will connect to Alluxio servers (masters and workers) as the user ‘yarn’ and not ‘foo’. This means any data interactions will be attributed to user ‘yarn’.
However, with client-side impersonation, the Alluxio client will determine that the Hadoop client user is ‘foo’, and then connect to Alluxio servers as user ‘yarn’ impersonating as user ‘foo’. Now, all data interactions will be attributed to user ‘foo’. With this impersonation enabled, the Alluxio client can operate on behalf of the same Hadoop client user, resulting in seamless and transparent interactions with Alluxio.
Why do I see these errors?
With Alluxio, you may encounter errors like “User yarn is not configured for any impersonation. impersonationUser: foo”. These are Alluxio server errors denying access, and you can find in logs/master.log
for masters or logs/worker.log
for workers, but sometimes are propagated to the applications as well.
If you see this error, it means the Alluxio servers are not configured properly, in order to enable the client-side Hadoop impersonation. This error, “User yarn is not configured for any impersonation. impersonationUser: foo
”, means “the application which is running under user ‘yarn’ is connecting to the Alluxio service and is trying to impersonate as the user ‘foo’, however, the Alluxio servers are not configured to allow user ‘yarn’ to do so”.
Why is impersonation being used?
A natural question to ask is why is impersonation being used to at all in this scenario? The reason Alluxio client attempts impersonation is because when the Alluxio client detects that the Hadoop client user is different from the Alluxio client user, the Alluxio client will attempt to impersonate as the Hadoop client user.
In the running example, the Hadoop client user is ‘foo’, so if the application was interacting directly with HDFS, it would read and write files as user ‘foo’. However, with Alluxio in the picture, the Alluxio client user is ‘yarn’ (different from user ‘foo’). Therefore, if impersonation is not utilized, the same application which previously interacted with HDFS as user ‘foo’ would interact with Alluxio as user ‘yarn’. Because of this different identity, the Alluxio client will instead try to impersonate as the Hadoop client user. In this example, the Alluxio user ‘yarn’ will impersonate the user ‘foo’.
How do I fix this error?
The simplest way to fix this error is to configure impersonation correctly on the Alluxio servers. This involves adding the following parameter to the Alluxio servers configuration files and restarting the Alluxio service:
alluxio.master.security.impersonation.<USER>.users=*
<USER>
is a placeholder which must be replaced by the actual username which requires impersonation abilities. For the running example, it would be:
alluxio.master.security.impersonation.yarn.users=*
This configuration means that user ‘yarn’ is able to impersonate any other user. Therefore, the next time user ‘yarn’ wants to impersonate user ‘foo’, the Alluxio servers will allow it and the applications can continue seamlessly.
An alternative method to avoid the error is to disable client-side impersonation altogether. This requires a client configuration parameter (not on the servers), by setting:
alluxio.security.login.impersonation.username=_NONE_
This disables the client-side impersonation feature, so the Alluxio client will not attempt to impersonate as the Hadoop client user. In the running example, this means when the application interacts with Alluxio, all the reads and writes will be as the user ‘yarn’, and not user ‘foo’.
See the impersonation documentation for further details and options for configuration.