Alluxio 2.1.0 Release

We are extremely excited to announce the release of Alluxio 2.1.0!

This release contains a variety of improvements ranging from user experience, bugfixes, and major performance improvements. A list of changes can be found below.

Highlighted Changes

Alluxio Structured Data Service

This release includes a new Alluxio subsystem for managing and transforming structured data. Structured data comes in the form of databases, tables, and partitions. It is the backbone of many companies’ analytics systems.

Alluxio is entering this space in order to provide performance improvements more than just raw I/O. With this release a simple command can transform data in raw formats (such as CSV) into Parquet files, a much more compact and performant file format which is more suitable for running queries on systems such as Presto.

Additionally, the release artifacts now include a Presto connector which can be used to connect your Presto cluster to Alluxio.

Read more about how to get started with Alluxio’s structured data service in the documentation

Kubernetes Helm Chart Deployment

Thanks to community contributions, and a special thanks to the Alibaba Cloud Kubernetes Team a more robust set of Kubernetes templates as well as a helm chart for Kubernetes deployment is now included with Alluxio. Thanks to the team and Alluxio maintainers for this great improvement! Read more about how to use Helm Chart to deploy Alluxio in the Kubernetes environment

Support for Google Dataproc

A public Google Dataproc init action is now available for users to deploy Alluxio with Google Cloud. Read more about how to deploy Alluxio with Google Dataproc in the documentation.

Reduction of Default Block Size

In this release the default block size in Alluxio has been reduced from 512MB to 64MB. By decreasing the block size, evictions on workers will evict lesser-used data with finer granularity. This can improve use cases where the block size is relatively large compared to the block size of files.

General Improvements

  • The embedded journal quorum now utilizes a gRPC-based transport for its RPCs (4095a1c11c)
  • Process launching can now be done in the foreground (603c6fc291)
  • Added CLI options for listing last access time (d1d1adffab)
  • Remove deprecated FaultTolerantFileSystem or alluxio-ft:// (3d7d18dbe2)
  • Reduce the default block size to 64 MB from 512 MB (9d7338cb58)
  • Docker containers now handle signals properly (8238b1e2b6)
  • Hadoop 3.2 is now supported as a UFS (dcfd9cfc1f)
  • Parallelism option for distributedLoad (f5b70fd71f)
  • Support for Google Dataproc (ae33402852)
  • Support for true owner and group with FUSE (a3987c6527)
  • SCM revision option as a part of alluxio version command (2e67f3be2b)
  • Removal of MapR support (86e8061ee2)
  • Blacklist paths for files written with ASYNC_THROUGH (a31ee0a10a)
  • Upgrade OSS UFS client dependency to 3.6.0 (76aa706215)
  • Show progress when taking and applying backups: 5d61a3deb5)

Bug Fixes

  • NPE when UFS journal shutdown fails (f7c8c2e316)
  • Remove query parameters in when browsing in UI (9a6cc464ed)
  • NPE on Embedded journal shutdown (cf14aefd6f)
  • Properly close journal when stopping (fd9b6f942c)
  • Avoid exception when audit log is used with NOSASL authentication (0e3a152f96)
  • Sync cache should consider syncing ancestors (b880cef284)
  • Properly handle interrupt in DynamicResourcePool (ad5c15d6a5)
  • Properly handle interrupts on various heartbeats (8d2a6ec179)
  • Properly handle recovering from a journal UFS error (be4e9cb1f4)
  • Remove duplicate entries in BlockInfo tab of worker WEBUI (bd00866095)
  • Prevent data loss when writing with ASYNC_THROUGH (b69e73de1e)