The Alluxio core engineering team re-designed things to come up with a more efficient and transparent way for users to leverage data orchestration through the POSIX interface. This enables much better performance for ML workloads where data is accessed via the POSIX interface.
Alluxio’s distributed systems experts explore today’s data access challenges and open source data orchestration solutions for modernizing your data platform.
Alluxio has an excellent metrics system and supports various kinds of metrics, e.g. an embedded JSON sink and the prometheus sink. Users and developers can easily create a custom sink of Alluxio by implementing the Sink interface.
Nowadays it is not straightforward to integrate Alluxio with popular query engines like Presto on existing Hive data. Solutions proposed by the community like Alluxio Catalog Service or Transparent URI brings unnecessary pressure on Alluxio masters when querying files should not be cached.
RaptorX is an internal project name aiming to boost query latency significantly beyond what vanilla Presto is capable of. For this session, we introduce the hierarchical cache work including Alluxio data cache, fragment result cache, etc. Cache is the key building block for RaptorX.