Alluxio 2.9.1 Release

We are excited to announce the release of Alluxio 2.9.1! This is an edge release for Alluxio open source on top of Alluxio 2.9.0, with various bug fixes, documentation, and improvements.

HIGHLIGHTS

Improved load command

The load command in the Alluxio CLI (db9f07) is updated to use a new infrastructure (different from the existing job service) to asynchronously load all files under the given directory path with better performance and stability. New command line arguments are available to enhance the operation’s usability, such as limiting the UFS bandwidth and running a verify step after the load operation is complete to check that the expected files are loaded correctly.

See the CLI documentation for the full description of the updated load command.

Monitor helm chart

This helm chart (ca8132) spawns a monitoring system based on Prometheus and Grafana upon deployment. It is able to monitor the status, metrics, and some other information of an Alluxio cluster on Kubernetes. Users can access the Grafana web UI through the Grafana web port.

See README for details of deploying this monitoring system.

Unsafe flush option for embedded journal

When using the embedded journal, each journal entry must be flushed to disk on all masters before being committed. This operation can be a performance bottleneck on slow or busy disks. The newly added property alluxio.master.embedded.journal.unsafe.flush.enabled (3fe8e0) allows the system to continue without waiting for the flush to complete, but at the risk of losing data if half the master nodes fail. The documentation discusses other safer ways to alleviate this performance bottleneck.

Compression level option for RocksDB checkpoint

In order for the system to recover quickly after failures or restarts, checkpoints of the system are taken at every 2 million journal entries by default. The checkpoints of the metadata in RocksDB are compressed to reduce their size. The property alluxio.master.metastore.rocks.checkpoint.compression.level (61f5af) allows the user to set a compression level for these checkpoints (0 for no compression, 9 for maximum compression). A value of 1 is recommended as higher levels give little benefit in terms of amount of compression at the cost of a large increase in computation.

IMPROVEMENTS AND FIXES SINCE 2.9.0

Notable configuration property updates

Property keyOld 2.9.0 valueNew 2.9.1 value
alluxio.worker.fuse.mount.optionsdirect_ioattr_timeout=600,entry_timeout=600

Master

  • Fix bug for ufs journal dumper when read regular checkpoint (de4f1b)
  • Fix concurrent sync dedup (12ecbc)
  • Add more observability on inode tree corruption (5cb7a9)
  • Add compression level option for RocksDB checkpoint (61f5af)
  • Support log source ip to rpc debug log (017078)
  • Bump ratis version to 2.4.1 (1e95ed)
  • Improve the PollingMasterInquireClient logic (1d6cb2)
  • Refactor simple master services out of main master process classes (1cbbf8)
  • Use RPC hostname as fallback master hostname (881849)
  • Fix ip is null in audit log (5fdf51)
  • Fix stale buildVersion when downgrade workers (952721)
  • Remove file from UfsAbsentPathCache after persisting (9ff756)
  • Support TTL for synced inode (79fe43)
  • Update raft group only on config change (ff88f8)
  • Add unsafe flush option to embedded journal (3fe8e0)
  • Upgrade Apache Ratis from 2.3.0 to 2.4.0 (6b5331)
  • Delete worker metadata from master after heartbeat timeout (8183d1)
  • Support RocksDB inode/block store to different disk paths (5f3188)
  • Support config ratis configurations through alluxio config (62c319)
  • Optimize MasterWorkerInfo memory usage by introducing fastutil Set (a1e1e3)

S3 API

  • Reduce redundant calls in getObject of S3 API (db2404)
  • Eliminate race condition in completempupload and support overwrite (e041e8)
  • Add Content-Range header for getObject (4b83fa)
  • Sort part files for uploading (5df5cf)
  • Add encoding-type support for S3 ListObjects and more logging (e25cdc)
  • Fix out of bound error in parsing s3 Authorization header (0c221f)

CLI

  • Restore table command with deprecated status (6b8887)
  • Add a command to set DirectChildrenLoaded on dir (822834)
  • Ignore no_cache setting for “load” command (70bcee)
  • Add a removeAll pathConf cmd to support remove all path conf (6b8065)
  • Add a command to free a worker (32785f)

Fuse

  • Support fuse sdk seek (772915)
  • Support fuse test to test against S3 (4fd428)
  • Add new Alluxio-FUSE as local cache solution with UFS gateway (705224)
  • Add macFUSE check for MacOS (edf309)

Job service

  • Add new distributed load (db9f07)
  • Add jobName into audit log for run command (64f396)
  • Fix nullpointerException in distributed cmd (8da4f5)
  • Improve job worker health report (ef9b76)
  • Fix null in distributed load cmd output (3df564)

Worker

  • Throw Error after reply error to client (b13d7d)

Client

  • Fix client side config using wrong hash (e69e3e)
  • Remove caching for CapacityBaseRandomPolicy (8a66a6)

Metrics

  • Make client send version to server and audit log contain version (bd74a8)
  • Fix Master.JournalSequenceNumber metrics in RaftJournalSystem (bd30e2)
  • Clear metrics when closing JournalStateMachine (b9d2e7)
  • Configure block and inode metastore separately (a8090b)

K8s and deployment

  • Support monitor helm chart (0f4e59)
  • Build and symlink to shaded client jar within client/build (ca8132)

UFS

  • Support STS for OSS ufs through RAMRole (bbe99b)
  • Add verbose mode to fs mount (562e2c)
  • Support multiple versions of cosn lib jars in alluxio tarball (a7b33a)
  • Add hadoop dependencies into ozone ufs connector (d0d298)
  • Support getUnderFSType for ozone,cosn,cephfs-hadoop (29c70c)

Web UI

  • Support display revision in webui (54094d)
  • Add more cors config and make cors handle all http request (9ba799)
  • Add a UI page for masters (fdf8d3)
  • Display build version of workers in WebUI and capacity command (7d8ad9)

Stressbench

  • Fix MultiOperation Stress Master Bench (1bb53f)
  • Add multi operation master stress bench (d90eef)
  • Support specify write type for StressWorkerBench (12c85b)

ACKNOWLEDGEMENTS

We want to thank the community for their valuable contributions to the Alluxio 2.9.1 release. Especially, we would like to thank:

Haoning Sun (Haoning-Sun), Kaijie Chen (kaijchen), Ling Bin (lingbin), Lucas (lucaspeng12138), Shuaibing Zhao (StephenRi), Vimal (vimalKeshu), XiChen (xichen01), Xinran Dong (007DXR), Yaolong Liu (codings-dan), Zihao Zhao (zhezhidashi), Bing Zheng (bzheng888), chunxiaozheng, Wei Deng (dengweisysu), Tianbao Ding (flaming-archer), humengyu (humengyu2012), jianghuazhu, Baolong Mao (maobaolong), Lei Qian (qian0817), Zhaoqun Deng (secfree), Yanbin Zhang (singer-bin), voddle, wuzhenhua (wuzhenhua01), xpbob, yiichan (YichuanSun), zhigang huang (zerorclover)

Enjoy the new release and look forward to hearing your feedback on our community slack channel.