Alluxio 2.6.2 Release

We are excited to announce the release of Alluxio 2.6.2! This is an edge release for Alluxio open source on top of Alluxio 2.6.1, with a variety of bug fixes, documentation, and improvements.

Highlights

Add Alluxio Stress Test Framework

Alluxio StressBench is a built-in tool to benchmark the performance of an Alluxio deployment without any extra services. Alluxio 2.6.2 supports the following suites to benchmark: 

  • Master RPC throughput (430e4a)
    • bin/alluxio runClass alluxio.stress.cli.RegisterWorkerBench
    • bin/alluxio runClass alluxio.stress.cli.WorkerHeartbeatBench
    • bin/alluxio runClass alluxio.stress.cli.GetPinnedFileIdsBench
  • Alluxio POSIX API read throughput (634bd32)
    • bin/alluxio runClass alluxio.stress.cli.fuse.FuseIOBench
  • Job Service throughput (0cae910)
    • bin/alluxio runClass alluxio.stress.cli.StressJobServiceBench

Support Transfer Alluxio Leadership during Runtime

When deploying a High Availability cluster using the Embedded Journal, users can now manually specify the leading master. This is useful when users want to debug or do maintenance on a server without killing an existing running master process. This new functionality transfers the leadership of the quorum gracefully to another master specified. 

The docs show how to use this new feature.  (d6c6733) (d67996c1) (e79fcdd)

Docker Images for Production and Development

In 2.6.2, users can pull  two separate docker images: alluxio/alluxio:2.6.2 and alluxio/alluxio-dev:2.6.2. alluxio/alluxio:2.6.2 is a docker image for production usage optimized for image size and alluxio/alluxio-dev:2.6.2 installs extra tools for development usage. (71f62c36) (768d45c)

Improve Alluxio Load Command

The load command is improved to use the new worker API to avoid extra data copy to the client. (05e081d1)

Metrics Enhancements

A bunch of new metrics is added for users to better understand the Alluxio cluster status.

  • Expose Prometheus metrics from all servers (1a6054ad)
  • Add metrics of Alluxio logging (0fba8bb)
  • Print web metrics servlet page as human-readable format (b1db0716)
  • Support export ratis metrics (a684440)
  • Add master LostFile and lost blocks metric (2238a64b)
  • Add metric of jvm pause monitor (ef4aaab2)
  • Add metric of Operating System (67e568ff4)
  • Add metadata cache metric (e2ee953)
  • Register journal sequence number metric (449d1ae9)
  • Support total block replica count metric (13ec038b)
  • Add metrics to track master RPC throughput (b2a40192)

Other improvements

  • Improve documentation surrounding worker tiered stores(c93e61e)
  • Avoid redundant query for conf address (6012721)
  • Add container host information on worker page (e5e53e08)
  • Release workerInfoList when a job completes (a0c3c6a4)
  • Support web server for fuse process (83c16f67c)
  • Update system tuning docs (735973)
  • Make umount fuse properly (2df83726
  • Provide entry points for providing java-based TLS security to gRPC
  • Channels (ea49f3b31)
  • Count the number of successful and failed job in distributed job commands (2c792f987)
  • Allow Probes to configure in Helm Chart (4991e84)
  • Support list a specific status of job (fdf9d4f4)
  • Add doc on Presto and Iceberg (2a56d12)
  • Reduce the risk of sensitive information leak in rpc debug/error log (ea00090)
  • Add configuration of min and max election timeout (b26d200ca)
  • Support Fuse on Worker process in Kubernetes helm yaml files (d2e947243)
  • Create smaller alpine and centos development docker image (22ecb2c2)
  • Add property to skip listing broken symlinks on local UFS (b5f318e7a)
  • Update evictor(LRU) reference when get a page in LocalCacheManager (c9e396a3)
  • Close gRPC input stream when finished reading to speed up data loading in ML/DL workloads (4f7a8877)

Bug Fixes

  • Fix the button of logs tab page cannot work issue (ffbb7395)
  • Fix process local read write client side logics and add unit tests (87e08e2)
  • Stop leaking state-lock when journal is closed (ef2d38f6)
  • Fix ArrayIndexOutOfBoundsException when using shared.caching.reader (f1f49e5ea)
  • Fix the job server or job worker starts failed (3f5b76da)
  • Fix job completion logging (80cf7ca)
  • Fix block count metrics (edb5169)
  • Fix race condition in StressMasterBench (50a4738)
  • Fix last snapshot index in delegated backup (15c0838a)
  • Make quorum info command more expressive (8704ea1)
  • Handle some exceptional cases to prevent leaks (bd2f945e3)
  • Remove ramfs from size-checking condition(257da58)
  • Make the stopwatch thread-safe in readInternal (8e03d6d1c)
  • Fix the job server service hangs on when setting a no privileged path (6a0c01d)

Acknowledgments

We have a list of community contributors who help Alluxio 2.6.2. This release would not have been possible without your support! Especially, we would like to thank:

Curt, Horasal, Nan Li, Nirav Chotai, Yaolong Liu, Bing Zheng, Chenliang Lu, kqhzz, l-shen, litao, Baolong Mao, qian0817

Enjoy the new release and look forward to hearing your feedback on the community slack channel.