Improving Memory Utilization of Spark Jobs Using Alluxio
This office hour shares a demo and compares two approaches, caching data directly in-memory into the Spark JVM versus storing data off-heap via an in-memory storage service like Alluxio
The latest Alluxio meetups, webinars, conferences and more
This office hour shares a demo and compares two approaches, caching data directly in-memory into the Spark JVM versus storing data off-heap via an in-memory storage service like Alluxio
Running Spark with Alluxio is a popular stack particularly for hybrid environments. In this session, Dipti will briefly introduce Alluxio, share the top 10 tips for performance tuning for real-world workloads, and demo Alluxio with Spark.
Presto is widely used for data science, business analytics, and operations. Presto’s SQL is a main driver for this, as it is ANSI-compliant, easy to ramp-up, and has rich functionality. Given the versatility and flexibility of this software, there is also a huge demand to develop interfaces for other critical data domains like real-time dashboards, stream processing, and large-scale batch computations. We will explore some interesting systems and prototypes to bring Presto to these new domains.
In this tech talk, we’ll discuss why DBS turned to Alluxio’s bursting approach to help solve on-prem compute capacity challenges.
Announcing the first Data Orchestration Summit in November 2019! This Summit brings together data engineers, cloud engineers, data scientists, and industry thought leaders who are solving data problems at the intersection of cloud, AI, and data.
This tech talk will share approaches to burst data to the cloud along with
how Alluxio can enable “zero-copy” bursting of Spark workloads to cloud data services like EMR and Dataproc. Learn how DBS bank uses Alluxio to solve for limited on-prem compute capacity.
In this online meetup, we will present the benefits of the fast analytics stack of Spark on Alluxio, and dive into China Unicom’s use case of leveraging Spark and Alluxio to process massive amounts of mobile data.
In this presentation, Ryte’s Chapter lead engineer, Danny Linden, shows why & how we solve some challenging technical issues, improve the speed, and reduce costs of our AWS EMR Hadoop & Presto -Backend with Alluxio to an awesome level!
In this webinar, Adit will present this new approach of bringing data locality to data-intensive compute workloads in Kubernetes environments, and demo how to setup and run Apache Spark and Alluxio in Kubernetes.