Recap: Presto Summit SF 2019

July 1, 2019

Amelia Wong

What’s Presto Summit? It’s the leading Presto conference co-organized by our partner Starburst Data and the Presto Software Foundation.

Overview of the Summit

Presto is among the fastest growing open source analytical query frameworks with production use cases across industries such as retail, telco, tech and more
This was a full house event at Twitter HQ with more than 150 attendees
Excellent keynote on the future of Presto delivered by Martin Traverso, Dain Sundstrom, and David Phillips – the co-creators of Presto and co-founders of the Presto Software Foundation (slides)

What We Learned

Presto is high performance and has rich functionalities designed for interactive SQL queries
Rise of cloud deployments of Presto both all cloud as well as hybrid cloud are growing. There is a need to simplify hybrid deployments which currently include making copies of data in the cloud and in some cases to HDFS and managing the Hadoop cluster.
Big data workloads are increasingly looking for simpler compute orchestration and adopting Kubernetes. Starburst Presto announced a new Kubernetes operator to simplify deployment and scaling of clusters.

Reasons to try the Presto, Alluxio, and Any Storage Stack

Learn more: Starburst Presto and Alluxio announce strategic OEM partnership | Presto with Alluxio | Download Alluxio

Additional resources:

Community office hour (virtual): Building Fast SQL Analytics with Presto, Alluxio, and S3
Got questions? Chat with Alluxio experts on Slack

Share this post

Blog

Inferless Slashes AI Model Loading Time by 12x in LLM Serving Infrastructure Using Alluxio

Inferless solved critical I/O bottlenecks in LLM inference infrastructure by implementing Alluxio, achieving 10x faster model loading (from ~200 Mbps to 2+ Gbps), reducing cold start times from minutes to seconds, and significantly improving customer experience.

New Features in Alluxio Enterprise AI 3.6

Learn about the latest features in Alluxio AI 3.6, including Accelerated AI Cold Starts for inference servers, pushdown parquet query acceleration, and more!

How Coupang Leverages Distributed Cache to Accelerate Machine Learning Model Training

Coupang, a Fortune 200 technology company, manages a multi-cluster GPU architecture for their AI/ML model training. This architecture introduced significant challenges, including:

Time-consuming data preparation and data copy/movement
Difficulty utilizing GPU resources efficiently
High and growing storage costs
Excessive operational overhead maintaining storage for localized data silos

To resolve these challenges, Coupang’s AI platform team implemented a distributed caching system that automatically retrieves training data from their central data lake, improves data loading performance, unifies access paths for model developers, automates data lifecycle management, and extends easily across Kubernetes environments. The new distributed caching architecture has improved model training speed, reduced storage costs, increased GPU utilization across clusters, lowered operational overhead, enabled training workload portability, and delivered 40% better I/O performance compared to parallel file systems.

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer

Request a demo