Many Spark users may not be aware of the differences in memory utilization between caching data directly in-memory into the Spark JVM versus storing data off-heap via an in-memory storage service like Alluxio. In this office hour, I will highlight the two approaches with a demo and open up for discussions
Slides from our latest talks
Alluxio, an open source data orchestration technology, helping speed up Dataproc workloads by providing a distributed caching layer in the Dataproc Cluster.
This talk describes a stack of open-source projects to serve high-concurrent and low-latency SQL queries using Presto with Alluxio on big data in the cloud. Deploying Alluxio as a data orchestration layer to access cloud storage object storage (e.g., AWS S3), this architecture greatly enhances the data locality of Presto with distributed and cross-query caching, thus avoids reading same data repeatedly from the cloud storage.
JD.com is China’s largest online retailer. It uses Alluxio to provide support for ad hoc and real-time stream computing, using Alluxio-compatible HDFS URLs and Alluxio as a pluggable optimization component.
In this talk, HY discussed the key challenges and trends impacting data engineering, and explores the concept of Data Orchestration.
This session talks about challenges associated with querying diverse data sources at Walmart and how those are tackled using Presto & Alluxio.
In this talk, we share our lessons in building and rebuilding our monitoring systems and data platforms at Electronic Arts (EA).
Best use cases for Presto from the Data Engineer’s perspective. Also hear about recent Presto advancements such as Cost-Based Optimizer, Kubernetes-native deployment and the project roadmap going forward.
Alluxio core maintainers and founding engineers share the latest innovations in Alluxio 2.