Alluxio’s distributed systems experts explore today’s data access challenges and open source data orchestration solutions for modernizing your data platform.
Slides from our latest talks
Alluxio has an excellent metrics system and supports various kinds of metrics, e.g. an embedded JSON sink and the prometheus sink. Users and developers can easily create a custom sink of Alluxio by implementing the Sink interface.
Nowadays it is not straightforward to integrate Alluxio with popular query engines like Presto on existing Hive data. Solutions proposed by the community like Alluxio Catalog Service or Transparent URI brings unnecessary pressure on Alluxio masters when querying files should not be cached.
RaptorX is an internal project name aiming to boost query latency significantly beyond what vanilla Presto is capable of. For this session, we introduce the hierarchical cache work including Alluxio data cache, fragment result cache, etc. Cache is the key building block for RaptorX.
Today’s analytics workloads demand real-time access to expansive amounts of data. This session demonstrates how Alluxio’s data orchestration platform, running on Intel Optane persistent memory, accelerates access to this data and uncovers its valuable business insights faster.
Driven by strong interests from our open-source community, the core team of Alluxio started to re-design an efficient and transparent way for users to leverage data orchestration through the POSIX interface.
RAPIDS is a set of open source libraries enabling GPU aware scheduling and memory representation for analytics and AI. Spark 3.0 uses RAPIDS for GPU computing to accelerate various jobs including SQL and DataFrame. With compute acceleration from massive parallelism on GPUs, there is a need for accelerating data access and this is what Alluxio enables for compute in any cloud. In this talk, you will learn how to use Alluxio and Spark with RAPIDS Accelerator on NVIDIA GPUs without any application changes.
Alluxio’s capabilities as a Data Orchestration framework have encouraged users to onboard more of their data-driven applications to an Alluxio powered data access layer. Driven by strong interests from our open-source community, the core team of Alluxio started to re-design an efficient and transparent way for users to leverage data orchestration through the POSIX interface.
At Aspect Analytics we intend to use Dask, a distributed computation library for Python, to deal with MSI data stored as large tensors. In this talk we explore using Alluxio and Alluxio FUSE as a data consolidation and caching layer for some of our bioinformatics workflows.