Products
Bursting Spark or Presto Jobs to AWS using Alluxio
June 23, 2020
The hybrid cloud model, where cloud resources run Spark or Presto jobs against data stored on-premises, is an appealing solution to reduce resource contention in on-premise environments while also saving in overall costs. One key flaw in a hybrid model is the overhead associated with transferring data between the two environments. Data and metadata locality within the compute application must be achieved in order to maintain the similar performance of analytics jobs as if the entire workload was run on-premises.
In this office hour, we demonstrate how a “zero-copy burst” solution helps to speed up Spark and Presto queries in the public cloud while eliminating the process of manually copying and synchronizing data from the on-premise data lake to cloud storage. This approach allows compute frameworks to decouple from on-premise data sources and scale efficiently by leveraging Alluxio and public cloud resources such as AWS.
We will cover:
- Typical challenges of moving data to the cloud and expanding compute capacity.
- Details about “zero-copy” hybrid cloud solution for burst computing
- A demo of running Presto analytic queries using remote on-prem HDFS data with Alluxio deployed in AWS EMR
The hybrid cloud model, where cloud resources run Spark or Presto jobs against data stored on-premises, is an appealing solution to reduce resource contention in on-premise environments while also saving in overall costs. One key flaw in a hybrid model is the overhead associated with transferring data between the two environments. Data and metadata locality within the compute application must be achieved in order to maintain the similar performance of analytics jobs as if the entire workload was run on-premises.
In this office hour, we demonstrate how a “zero-copy burst” solution helps to speed up Spark and Presto queries in the public cloud while eliminating the process of manually copying and synchronizing data from the on-premise data lake to cloud storage. This approach allows compute frameworks to decouple from on-premise data sources and scale efficiently by leveraging Alluxio and public cloud resources such as AWS.
We will cover:
- Typical challenges of moving data to the cloud and expanding compute capacity.
- Details about “zero-copy” hybrid cloud solution for burst computing
- A demo of running Presto analytic queries using remote on-prem HDFS data with Alluxio deployed in AWS EMR
Videos:
Presentation Slides:
Complete the form below to access the full overview:
.png)
Videos
AI/ML Infra Meetup | AI at scale Architecting Scalable, Deployable and Resilient Infrastructure

Pratik Mishra delivered insights on architecting scalable, deployable, and resilient AI infrastructure at scale. His discussion on fault tolerance, checkpoint optimization, and the democratization of AI compute through AMD's open ecosystem resonated strongly with the challenges teams face in production ML deployments.
September 30, 2025
AI/ML Infra Meetup | Alluxio + S3 A Tiered Architecture for Latency-Critical, Semantically-Rich Workloads

In this talk, Bin Fan, VP of Technology at Alluxio, presents on building tiered architectures that bring sub-millisecond latency to S3-based workloads. The comparison showing Alluxio's 45x performance improvement over S3 Standard and 5x over S3 Express One Zone demonstrated the critical role the performance & caching layer plays in modern AI infrastructure.
September 30, 2025
AI/ML Infra Meetup | Achieving Double-Digit Millisecond Offline Feature Stores with Alluxio

In this talk, Greg Lindstrom shared how Blackout Power Trading achieved double-digit millisecond offline feature store performance using Alluxio, a game-changer for real-time power trading where every millisecond counts. The 60x latency reduction for inference queries was particularly impressive.
September 30, 2025