This article describes how Baidu creates a secure, modular and extensible distributed file system service in project Pingo – a big data analytics solution for enterprises – based on the open-source project Alluxio. In this article, you will learn how to incorporate Alluxio to implement a unified distributed file system service as well as how to add extensions on top of Alluxio including customized authentication schemes and UDF (user-defined functions) on Alluxio files.
1. Goal and Challenges
Pingo is product from Baidu that provides offline big data analytics solution with Apache Spark as the computation engine for resource scheduling, data and metadata management, workflow management. Pingo is not only used in Baidu’s internal infrastructure, but also used in serving users of the public cloud of Baidu and private cloud deployments.
As a result of these target use cases, Pingo by design needs to support efficient and unified data access regardless of whether the data is local or remote, structured or unstructured, stored in on-prem storage devices or cloud storage service. In addition, to target help enterprises with large number of files to process, their security requirements in authentication and authorization further complicate the problem.
Pingo solves the above problems by leveraging open source project Alluxio to build a file system service as shown in Figure 1. Particularly, Pingo introduced Alluxio to abstract away the difference between various storage solutions. Pingo also enhanced Alluxio to provide unified authentication management without exposing the original storage system’s authentication information.
2. Implementation and Customization
In Pingo, the file system management service (PFS) is implemented based on Alluxio. With the mounting capabilities provided by Alluxio, PFS can easily integrate with a variety of file systems or object stores such as HDFS, S3, Baidu Object Storage (BOS) or local file systems in Linux (see more details in section 2.1).
After mounting specific distributed file systems to PFS using the unified file system API, users only need to work with the built-in account and permission management system in Pingo, hiding all the diversity of specific distributed file systems. On the other hand, Pingo also added to Alluxio a customized ACL permissions mechanism to help data management across large teams within the enterprise (see section 2.2, 2.3).
In addition, we also integrated table level authorization with file system level permission checking in our product (section 2.4) and implemented a file-based UDF management mechanism based on PFS as well (section 2.5)
2.1 Supporting New File Systems Types
Pingo supports mounting BOS, which is an object storage service provided by Baidu Public Cloud. BOS provides an interface similar to AWS S3. However, there are still issues in using the S3 protocol to mount BOS to Alluxio. To access files in BOS, we implemented BOSUnderFileSystem
class extending existing Alluxio’s ObjectUnderFileSystem
abstract class. In addition, we also implemented another under storages called SshUnderFileSystem
to access files via SFTP (SSH File Transfer Protocol) available on Alluxio alluxio-extensions code base on Github.
2.2 Customized Authentication
As mentioned, Pingo has the built-in UserService for user authentication. We extended the existing authentication mechanism in Alluxio by adding a new authentication service type based on Pingo UserService. Specifically, we added an enumeration value PINGO
for alluxio.security.authentication.AuthType
. On the Alluxio Master side, we added a PingoAuthenticationProvider
to forward the username and authentication information sent by the client to Pingo’s UserService for authentication.
2.3 Fine-grained Authorization
Due to some requirements specialized for big data workloads, we implemented a new mechanism to manage ACL permissions in PFS. We found that neither the conventional Unix privilege model (the default privilege model in Linux) nor the POSIX ACL privilege model can meet our requirement. Take the following requirements for example: we want to grant user “ua” and “ub” the read-access to all the sub-paths in a directory /a/b/data
recursively; no matter how many sub-paths are added under /a/b/data
, user “ua” and “ub” can automatically get read access.
However, if some day the admin decides to revoke the read access of “ub” to this directory, one only needs to operate on the permissions level at the directory /a/b/data
. Users with experience working with big data platforms may know that the number of files in a folder can easily be tens of thousands or even more, and the existing authorization system cannot effectively solve this problem at this scale.
PFS supports both the Unix privilege model and PFS’s unique ACL privilege model. For READ and WRITE access, PFS will first follow any ACL record defined on the path if it exists; otherwise, it will fallback to Unix permission model. For MANAGE access (e.g. Linux chmod command requires management permission), it follows either ACL or Unix model authenticates the access.
ACL defines three types of permissions: READ, WRITE, and MANAGE. The executable permissions in the Unix permissions model are merged into read (READ) permissions. ACL authorization record for a file (clip) can be referred to as follows:
Inherit: true/false
USER: uname_1 READ/WRITE/MANAGE
...
USER: uname_n READ/WRITE/MANAGE
GROUP: gname_1 READ/WRITE/MANAGE
...
GROUP: gname_n READ/WRITE/MANAGE
Each USER: uname_n READ/WRITE/MANAGE
rule in the authorization record defines whether a user can read, write or manage the file or directory. The example mentioned earlier in this section is solved by the inherit
attribute. Unlike Unix privilege model where authentication only occurs at the last level of the path to be accessed, when inherit
is true, PFS authorizes the access as long as there is a record that satisfies the condition in the authorization record of each level path from the last level of the path to either the root or any node whose inherit
is false.
2.4 Integrated Table and File Permissions
One interesting difference between an analytics platform like Pingo and traditional Databases like MySQL is that in Pingo both tables and files are accessible, whereas in MySQL, one can only access tables by running various queries on the table through the client or JDBC. Accessing the files that actually store the data is not meaningful. In the big data system, table can be queried through SQL engines like Spark, but the original files can also be directly queried through MapReduce or Dataframe.
This creates the challenges in access control: if the user is authorized to access a table T1, the administrator may only want the user to access T1’s data through the provided SQL interface, without permission to access T1’s corresponding partition files. If the administrator revokes the user’s access to the T1 some day, the user will no longer have access to the file data corresponding to T1 table whether through SQL or the file system.
In Pingo, we connected the account authentication system for PFS and Pingo to implement a permission proxy mechanism as Figure 2 shown below. When table T1 is initially created, the information of T1’s creator is saved in TMeta Server. When a query is performed, user first completes access authentication of T1. After authentication is passed, the query engine can obtain the PFS path and creator information and authentication information corresponding to T1, and then actually authenticate T1’s creator in the PFS. This way, as long as the user has access to the table, the data of the table can be read.
2.5 File-based UDF Management
UDF is widely used in many SQL query engines. Users are commonly required to first upload the jar file and then register a temporary function in the SQL engine. This approach of using UDF not only requires extra steps, in general it is also hard to manage and version-control the jar files, thus often slows down the iteration.
We implemented a file-based UDF management solution in PFS as shown in Figure 3. The creator of a UDF only needs to upload the corresponding jar file to the specified directory inside PFS. The Alluxio worker will automatically extract the UDF information from the jar file and report it to the Master, automatically keep track of different UDF versions based on file names. Users do not have to register UDF when executing the SQL. Instead, they only need to specify the name or version of the function.
Our approach is similar to how Linux manages the “.so” files (dynamic link libraries) with a few benefits: UDF becomes very easy to use and easy to implement privileges management as a part of file system ACL permissions in PFS; it also enables small teams to iterate fast and share source code.
Summary
In this article, we mostly introduced how Pingo built its file system service using open source project Alluxio, and implemented add-on features like customized authentication, fine-grained authorization, file-based UDF management and etc. Our products are still rapidly iterating, and we welcome your valuable suggestions and comments.