Alluxio is an open-source distributed file system that provides data ecosystems a unified data access layer at in-memory speed. Alluxio enables compute engines like Spark, Presto, MapReduce, TensorFlow to transparently access different persistent storage systems (including HDFS, S3) while actively leveraging in-memory cache to accelerate data access. As a result, Alluxio simplifies the development and management of big data and ML workloads with lower cost and better performance. Alluxio has more than 900 contributors and is used by over 100 companies worldwide. Andrew will give an overview of Alluxio’s core concepts, architecture, data flow, and production use cases.
Presto + Alluxio + Object Store: Architecture and Use Case
Cloud object storage systems provide different semantics and performance implications compared to HDFS. Applications like Presto cannot benefit from the node-level locality or cross-job caching when reading from the cloud. Deploying Alluxio with Presto to access cloud solves these problems because data will be retrieved and cached in Alluxio instead of the underlying cloud or object storage repeatedly. Bin will present the architecture to combine Presto with Alluxio with use cases from major internet companies like JD.com and NetEase.com, and their lessons learned to operate this architecture at scale.
Presto Fast SQL on Anything
Presto, an open source distributed SQL engine, is widely recognized for its low-latency queries, high concurrency, and native ability to query multiple data sources. Proven at scale in a variety of use cases at Airbnb, Comcast, Facebook, FINRA, LinkedIn, Lyft, Netflix, Twitter, and Uber, in the last few years Presto experienced an unprecedented growth in popularity in both on-premises and cloud deployments over Object Stores, HDFS, NoSQL and RDBMS data stores. We will cover the architecture of Presto, its separation of compute and storage, and cloud-readiness. In addition, we will discuss some of the best use cases for Presto, recent advancements in the project such as Cost-Based Optimizer and Geospatial functions as well as the roadmap going forward.