Alluxio 2.0.0 Release

We are super excited to announce the release of Alluxio 2.0.0. This is Alluxio’s biggest release to date and it includes many new features and system improvements. Alluxio 2 is designed to focus on the following areas:

  • System robustness and scalability – Users are deploying Alluxio in larger and more critical environments. Alluxio 2 focuses on order of magnitude improvements for namespace and cluster scalability and robustness and flexibility in cases of failure or user error.
  • Machine learning and other write-once read-many POSIX workloads – The POSIX interface is becoming increasingly popular among Alluxio users. Alluxio 2 focuses on improving the interface and ease of use with popular use cases like machine learning.

Downloads can be found here. Thanks to everyone who have contributed to this release!

Alluxio 2 is not backward compatible with Alluxio 1; please see the documentation on how to upgrade from Alluxio 1 to Alluxio 2.

These are some of the highlights of the release

Data Driven Applications


Native Support for AI & ML Workloads with Alluxio POSIX API

A POSIX compatible API has always been a highly requested feature, and the popularity of the API is second only to the Hadoop API. In Alluxio 2.0.0, we enable a POSIX compatible API through Alluxio FUSE. Alluxio can be mounted as a local file system volume and accessed through traditional POSIX compatible clients. This is especially useful for applications not originally built for big data but now require access to data which has been stored in a data lake, for example, machine learning using Tensorflow. See the docs for details.

Enable Better Performance for Object Store with Fast Commits

Compute frameworks which use rename to commit results run much slower on object storages due to rename being an expensive operation. Alluxio 2.0.0 enables users to write temporary data into Alluxio and then asynchronously persist the data to the underlying storage. See the docs for more details.

Core


Support of Billions of Files with RocksDB Off Heap Metadata Management

Alluxio 2.0.0 provides the option to use RocksDB for storing file system metadata, enabling the namespace to scale to over one billion files. When running in off heap mode, an on-heap cache is used to serve more commonly accessed portions of the tree (for example, the root and top level directories), enabling most of the tree to be accessed at a similar speed as the in-memory metadata storage. Users can still configure `alluxio.master.metastore=ROCKS` to use the new RocksDB based metadata store. See this blog for more details.

Unified Data & Control Plane with gRPC

In Alluxio 2.0.0, we unify RPC framework for both control path and data path by replacing the Thrift/Netty based RPC framework with gRPC. Previously there were a lot of high level functions built on top of Netty to support high-performance data streaming, which is now much simplified with the built-in streaming APIs from gRPC. Code paths in Authentication, flow control and many other important features to support the legacy hybrid RPC framework are also unified on top of gRPC, which is much easier to maintain and extend from. All Alluxio RPC services are now defined using Protobuf IDL. See this blog for more details.

Ease of HA Deployment with RAFT based Embedded Journal

Alluxio 2.0.0 introduces a new mode for journaling metadata. The embedded journal is a fully contained, distributed state machine which uses the RAFT consensus algorithm. This allows users to have a fault tolerant and performant medium for journal storage, independent of 3rd party storage systems and zookeeper. Users running only on object stores will be able to deploy Alluxio in high availability mode without incurring a large metadata performance penalty or relying on another distributed storage. See the docs for more details.

Data Services


Alluxio 2.0.0 includes the Alluxio distributed Data Service, a set of services separate from the Alluxio masters and workers. The Alluxio Data Service acts as a lightweight distributed compute framework for internal Alluxio operations. In this release, the following services are available:

Adaptive Replication

Users can set a min and max replication factor at the file level. The Data Service is used to enforce this by adding and removing copies in Alluxio storage. See the docs for more details.

Persist and Async Persist

The logic for writing a file synchronously or asynchronously to the under store is handled through the Data Service enabling better load balancing and error handling. See the docs for more details.

Cross Under Stores Data Move

Moving files and directories across mount points is now possible by using the Data Service to do the data transfer. See the docs for more details.

Distributed Load

The Data Service allows data to be evenly loaded across nodes. See the docs for more details.

Under Store


Support for Concurrent Multiple HDFS Versions

Alluxio 2.0.0 allows users to connect to multiple HDFS under storages with different versions, even if they are not client compatible. For example, users can connect to Hadoop 1.x and Hadoop 2.x clusters mounted to the same Alluxio namespace. See the docs for more details.

Active HDFS Under File System Sync

Alluxio 2.0.0 integrates with HDFS iNotify to update stale data and metadata in a subscription-based model instead of the on-demand polling which was previously done. This greatly reduces the period of time a stale read is possible and improves metadata performance by reducing the number of unnecessary calls to the backing store. See the docs for more details.

Web (HTTP & HTTPS) Under Store Support

Users can now bring in data even from web-based data sources to aggregate in Alluxio to perform their analytics. Any web location with files can be simplified pointed to Alluxio to be pulled in as needed based on the query or model run. See the docs for more details.

DevOps


AWS Elastic Map Reduce (EMR) service integration

As users move to cloud services to deploy analytical and AI workloads, services like AWS EMR are increasingly used. Alluxio can now be seamlessly bootstrapped into an AWS EMR cluster making it available as a data layer within EMR for Spark, Presto and Hive frameworks. Users now have a high-performance alternative to cache data from S3 or remote data while also reducing data copies maintained in EMR. See the docs for more details.

Better Configuration Controls with Path Level Configuration

Users can specify default configuration on a per path basis, allowing administrators to have finer grained control on the settings clients use to access data. Clients may still override the path level configuration with client-side configuration. Path level configuration greatly simplifies the configuration necessary on client applications, for example, instead of having application logic switch write types, path level configuration can be used to ensure all writes to temporary directories are only written to memory. See the docs for more details.

Improved Kubernetes Deployment Support

Access Alluxio using the POSIX API from any application container in Kubernetes by simply mounting a hostPath volume after running the alluxio/alluxio-fuse container as a DaemonSet. See the docs for more details.

3rd Generation WebUI

The Alluxio web UI has been updated to use ReactJS and REST APIs. The overall user experience has been improved. The updated UI uses a dark theme with a better color palette. The set of cluster metrics and aggregated system metrics provided through the master has been improved. We have also added support for rendering timeseries on the UI. See the docs for more details.

Acknowledgments


We would like to thank the following members of the community for their contributions to Alluxio 2.0.0. The release would not have been possible without your efforts!

Aaron, Adit Madan, Aiqing He, Alex, Andrew Audibert, Ann Shan, Anzi Xu, Asalea, Bai Jiayang, Bang Xiao, BeiMaGang, Bin Fan, Bin Feng, Binyu Huang, Bo Xu, Bob Lee, CallMeSp, Calvin Jia, Carmen, Caroline, Changhao Zhou, Chen Chahoo, Chen Qian, Chen Yadong, Chen, Cheng Chang, Cheng Yu, Chengming Wang, Chengwei Tong, Chenyu Jin, Chongjie Li, Chris, Chuanqi Dong, Cjh, CliffShi, DONG, Dantelian, Darren9654, David Zhu, Destiny, Deya Wu, Dil Duan, Dipti Borkar, DoctorKey, Draculair, Edmund Seaver, Edward Martin, Fan Yang, Fan, Fangbin Sun, Fengbin He, Fengqian Zhang, Fred Wu, Ge Ruiyin, Gene Pang, GreenPines, Gryffindor, Guo Xu, Guoqiang Li, Guosheng Pan, Göktürk Gezer, Haiyang Gu, Han Wu, Hangfan Zhang, Haogang Wang, Haoyuan Li, Heller, Hitesh Wadekar, Hua Meng, Huang Haitao, Huang Weixiang, Huang Wenxuan, Huang Zixian, Huangdong Ding, Hyphon, Jason Tieu, Jiabang Liu, Jiacheng Liu, Jianhao Chen, Jiawei Chen, Jiaying Wu, JimZong, Jinchi Chen, Jingmian Wang, Jingtao Wang, Kael Chan, Kai Xun, Kaisheng Deng, Kang Fang, Kong Chang, Konnase Lee, LJ_Paul, Lei Meng, Li Guang Yao, Li Jiang, LiXin Xu, LiXinchun, LiheYoung, Linli Wu, Lintian Shi, Liu, Liulan Qin, Loren, Lu Qiu, Luoyi Zhang, Ma Bai, MaorunZhang, Meihua Dang, MelonRind, Ming Wang, Mingchao Zhang, Mingkai Lin, Mingtao Ji, Mingwei Li, Mirage Lyu, MoXuyan, Moxin Chen, Moya Zhang, Nakkul Sreenivas, Nomanous, PDSnoW, Patrick J Beam, Paul Wais, Peixuan Xia, Peng Jian, Pengfei Chen, Pisces, Qingning Lu, Qizi Hao, Qu Cui, Rafael, Randall Chen, Rao Lu, RazorX7, Rico Chiu, Robert Ridley, Ron Chen, Ruiyang Fang, Runtao Ni, Rush Sykes, SYSU-Linxp, Senrong Xu, Shawy Geng, Shiwei Feng, Shun Liu, Shuocheng Wang, Sicong Hu, Song Zhang, Su Lu, Sui An, Sun Yuhu, Taha Naqvi, TangSiyi, Tao Ying, Te Qi, ThousandOfWind, Thyrix Yang, Tian Qin, Tianlei Song, Tianye Zheng, Tingwei Zhu, Trap, Wan Ruiqin, Wang Zhangsuhui, Wei Cheng, Wei Yuang, Weifan Zhao, Wendy Kwan, Wenjie Xu, Wenjie Zhang, Wenjun Deng, Wenjun Huang, Wenzheng Yang, Wenzong Ru, William Zapata, Xiaohang Shi, Xiaoyuan Liu, Xin Qian, Xinyi Wang, Xu Yun, Xuan Liu, YJ-Shi, YMH, Yamei Dong, Yaxin Wang, Yi Ren, Yidi Shao, Yilei Feng, Yimeng Guo, Ying Li, Yinghao Yu, Yingxiao Du, Yitong Zhao, Youde Li, Young, Your Name, Yuhang Zhou, Yuhao He, Yujie Zhou, Yumeng Xu, Yunchuan Zheng, Yunman Shu, Yunpan Wang, Yupeng Fu, Yuting Tang, Yuzhou Wu, Zac Blanco, Zehao He, Zeng Zihan, Zeyu Ruan, Zhang Zhe, Zhao Tang, Zhao, Zhaoyang Li, Zheng, Zhenhua Su, Zhi Sun, Zhi Wang, Zhixiang Zhang, Zhonghao, Zhu Ruancheng, Zhunyi Xie, Zitai Xiao, Ziyang Li, Zyl, aaronx121, caiwen pu, candycheese, cartershi, cc818, chenwei li, cuizihan, fangyuchu, farmer, fightingZh, gao zhi, hamilton, hangfaichao, hxer7963, jiahao qi, jianjian jiang, jiarui Chen, jimmy tang, jixiaozhong, koncle, lengjiayi, lijunyou, litianqi, liuzx32, lqiulin, lule, lxcnju, lzk, newnius, njuxx, njuzmy, nyxmq, oooooverflow, perrywang, polarsun, qunqunqun, rookielxy, shenyin jie, snodawn, snowhealing, songhexiang, steve_chph, sunchutao, t x, tanlijuan, tbs459, tsdjh, usernamehcx, wangshiqi, wizcheu, wjj2, wofmanaf, wxpy, xiaocaotou, xiyue-yi, xqx1568, yawenouyang, yxWisdom, zhangyi, zhoafei, zhoudw-zdw