Presto Meetup Hosted @ UBER
This talk describes a stack of open-source projects to serve high-concurrent and low-latency SQL queries using Presto with Alluxio on big data in the cloud. Deploying Alluxio as a data orchestration layer to access cloud storage object storage (e.g., AWS S3), this architecture greatly enhances the data locality of Presto with distributed and cross-query caching, thus avoids reading same data repeatedly from the cloud storage.
In addition, in the latest v2.1 release, Alluxio provides structured data management to deliver additional performance beyond caching raw bytes of input files or objects, but also manage and transform structured data. For example, Alluxio can convert data in raw formats (such as CSV) into a more compact and performant file format (such as Parquet) to accelerate Presto queries by 10x for certain workloads with much less CPU used.
This talk will cover an overview of Alluxio’s core concepts, architecture, data flow, as well as the use cases from internet companies like Walmart and JD.com that run this stack of Presto and Alluxio at the scale in production.
Haoyuan (H.Y.) Li is the Founder, and CTO of Alluxio. He graduated with a Computer Science Ph.D. from the AMPLab at UC Berkeley, advised by Prof. Scott Shenker and Prof. Ion Stoica. At the AMPLab, he co-created and led Alluxio (formerly Tachyon), an open source virtual distributed file system. Before UC Berkeley, he got a M.S. from Cornell University and a B.S. from Peking University, all in Computer Science.
Bin Fan is the founding engineer and VP of Open Source at Alluxio, Inc. Prior to Alluxio, he worked for Google to build the next-generation storage infrastructure. Bin received his Ph.D. in Computer Science from Carnegie Mellon University on the design and implementation of distributed systems.