Alluxio Day 15
COMMUNITY vIRTUAL EVENT
Learn about the architecture of OceanBase and their use cases, how to use Alluxio to speed up your cloud training, how Twitter leverages a ML-based approach in Presto and BigQuery to estimate query utilization, and how you can add real-time analytics capability to your data pipeline with Apache Pinot.
Alluxio Day x APAC Modern Data Stack
Qanvast@OUE, OUE Downtown Gallery 1
6A Shenton Way, #02-9/10, Singapore 068815
Doors open at 6:00 PM SGT
6:30 Pm SGT – Unified Data API for Distributed Cloud Analytics and AI
Alluxio (www.alluxio.io) is an open-source virtual distributed file system that provides a unified data access layer for hybrid and multi-cloud deployments. It enables distributed compute engines like Spark, Presto or Machine Learning frameworks like TensorFlow to transparently access different persistent storage systems (including HDFS, S3, Azure and etc) while actively leveraging in-memory cache to accelerate data access. Developed originally from UC Berkeley AMPLab as research project “Tachyon”, Alluxio has more than 1200 contributors and is used by over 100 companies worldwide with the largest production deployment over 1000 nodes.
This presentation focuses on how Alluxio helps the big data analytics stack to be cloud-native. The trending Cloud object storage systems provide more cost-effective and scalable storage solutions but also different semantics and performance implications compared to HDFS. Applications like Spark or Presto will not benefit from the node-level locality or cross-job caching when retrieving data from the cloud object storage. Deploying Alluxio to access cloud solves these problems because data will be retrieved and cached in Alluxio instead of the underlying cloud or object storage repeatedly.
BIN FAN, FOUNDING MEMBER & VP OF OPEN SOURCE @ ALLUXIO
Bin Fan is a founding member and VP of open source at Alluxio. He’s also the PMC maintainer and PMC Chair of the Alluxio open source project. Prior to joining Alluxio as a founding engineer, he worked for Google to build the next-generation storage infrastructure. Bin received his PhD in computer science from Carnegie Mellon University on the design and implementation of distributed systems.
7:00 PM SGT – MODERN DATA STACK IN MOTION
In this presentation, I will talk about the birth, the growth, and the prosperity of modern data stack. I will show you why modern data stack is more than a buzzword, and how it will possibly evolve in the next couple of years.
YINGJUN WU, FOUNDER & CEO @ RISINGWAVE LABS
Yingjun Wu is the founder and CEO of RisingWave Labs (formerly known as Singularity Data), an early-stage startup building hardcore systems. The company develops RisingWave (https://www.risingwave.dev/), an open-source streaming database designed for the cloud. Previously, Yingjun was with AWS Redshift and IBM Research Almaden. He received his PhD degree from National University of Singapore, and was an alumnus of the Database Group, Carnegie Mellon University.
7:30 PM SGT – Apache Hudi: community-driven development
Apache Hudi’s open-source community is very active and healthy. In this talk, an overview of community-driven major features will be presented, followed by a deep-dive into two of those features, metastore and table management service, driven by Bytedance to illustrate Hudi’s platform vision.
SHIYAN XU, FOUNDING MEMBER @ ONEHOUSE, APACHE HUDI PMC
Shiyan Xu is a founding member and an engineering manager at Onehouse. He’s also a PMC member of Apache Hudi. Prior to Onehouse, he worked as a team lead at Zendesk, where he led the data lake team to productionize large-scale Hudi-based data platform services. Shiyan received his bachelor’s degree in Electrical & Electronic Engineering from Nanyang Technological University.
8:00 PM SGT – The power of data orchestration: Storage Acceleration and Servitization at Shopee
Shopee is the leading e-commerce platform in SouthEast Asia. In this presentation, Luo Li from Shopee will share their Data Infra team’s recent project on acceleration with Presto and storage servitization. He will share the details on how Shopee leverages Alluxio to accelerate Presto query and provide standardized methods of accessing data through Alluxio-Fuse and Alluxio-S3.
LUO LI, DIRECTOR OF DATA INFRA @ SHOPEE, ALLUXIO PMC
Luo Li is the Director of Data Infra at Shopee, where he and his team work to provide business teams with Big Data fundamental facilities and systems. With previous job experiences at Baidu, Alibaba, and DiDi, he has 10+ years of experience in BigData Infrastructure and is highly experienced in Apache Open Source Big Data ecosystems. Luo Li holds a Masters Degree in Computer Science from Beijing Institute of Technology and is also an Alluxio PMC.