Cloud object storage systems provide different semantics and performance implications compared to HDFS. Applications like Presto cannot benefit from the node-level locality or cross-job caching when reading from the cloud. Deploying Alluxio with Presto to access cloud solves these problems because data will be retrieved and cached in Alluxio instead of the underlying cloud or object storage repeatedly. Bin will present the architecture to combine Presto with Alluxio with use cases from major internet companies like JD.com and NetEase.com, and their lessons learned to operate this architecture at scale.
We are excited to present Alluxio 2.0 to our community. The goal of Alluxio 2.0 was to significantly enhance data accessibility with improved APIs, expand use cases supported to include active workloads as well as better metadata management and availability to support hyperscale deployments. Alluxio 2.0 Preview Release is the first major milestone on this path to Alluxio 2.0 and includes many new features.
TSOS meetups focus on the open source projects that Two Sigma cares most about, from projects we generated in-house then open sourced to large external open source projects that we depend on to do our work. This time, Wenbo Zhao (Two Sigma) and Bin Fan (Alluxio) will be presenting on how Two Sigma uses Alluxio to make data-intensive compute independent of the storage beneath.
This webinar reviews: The observation and analysis of trends of separation of Storage and Compute in Big Data ecosystem; Why and how to build a new data access layer between compute and storage in this data stack; Alluxio open source: history, overview, design, and architecture; Production Use case with Spark, Presto, Tensorflow and etc; A demo of running Presto on Alluxio on S3
Over the past two decades, the Big Data stack has reshaped and evolved quickly with numerous innovations driven by the rise of many different open source projects and communities. In this meetup, speakers from Uber, Alibaba, and Alluxio will share best practices for addressing the challenges and opportunities in the developing data architectures using new and emerging open source building blocks. Topics include data format (ORC) optimization, storage security (HDFS), data format (Parquet) layers, and unified data access (Alluxio) layers.