Alluxio Day 15

COMMUNITY vIRTUAL EVENT

Learn about the architecture of OceanBase and their use cases, how to use Alluxio to speed up your cloud training, how Twitter leverages a ML-based approach in Presto and BigQuery to estimate query utilization, and how you can add real-time analytics capability to your data pipeline with Apache Pinot.

Alluxio Day x APAC Modern Data Stack

Qanvast@OUE, OUE Downtown Gallery 1

6A Shenton Way, #02-9/10, Singapore 068815

Doors open at 6:00 PM SGT

6:30 Pm SGT – Unified Data API for Distributed Cloud Analytics and AI

Alluxio (www.alluxio.io) is an open-source virtual distributed file system that provides a unified data access layer for hybrid and multi-cloud deployments. It enables distributed compute engines like Spark, Presto or Machine Learning frameworks like TensorFlow to transparently access different persistent storage systems (including HDFS, S3, Azure and etc) while actively leveraging in-memory cache to accelerate data access. Developed originally from UC Berkeley AMPLab as research project “Tachyon”, Alluxio has more than 1200 contributors and is used by over 100 companies worldwide with the largest production deployment over 1000 nodes.

This presentation focuses on how Alluxio helps the big data analytics stack to be cloud-native. The trending Cloud object storage systems provide more cost-effective and scalable storage solutions but also different semantics and performance implications compared to HDFS. Applications like Spark or Presto will not benefit from the node-level locality or cross-job caching when retrieving data from the cloud object storage. Deploying Alluxio to access cloud solves these problems because data will be retrieved and cached in Alluxio instead of the underlying cloud or object storage repeatedly.

BIN FAN, FOUNDING MEMBER & VP OF OPEN SOURCE @ ALLUXIO

Bin Fan is a founding member and VP of open source at Alluxio. He’s also the PMC maintainer and PMC Chair of the Alluxio open source project. Prior to joining Alluxio as a founding engineer, he worked for Google to build the next-generation storage infrastructure. Bin received his PhD in computer science from Carnegie Mellon University on the design and implementation of distributed systems.


7:00 PM SGT – MODERN DATA STACK IN MOTION

In this presentation, I will talk about the birth, the growth, and the prosperity of modern data stack. I will show you why modern data stack is more than a buzzword, and how it will possibly evolve in the next couple of years.

YINGJUN WU, FOUNDER & CEO @ RISINGWAVE LABS

Yingjun Wu is the founder and CEO of RisingWave Labs (formerly known as Singularity Data), an early-stage startup building hardcore systems. The company develops RisingWave (https://www.risingwave.dev/), an open-source streaming database designed for the cloud. Previously, Yingjun was with AWS Redshift and IBM Research Almaden. He received his PhD degree from National University of Singapore, and was an alumnus of the Database Group, Carnegie Mellon University.


7:30 PM SGT – Apache Hudi: community-driven development

Apache Hudi’s open-source community is very active and healthy. In this talk, an overview of community-driven major features will be presented, followed by a deep-dive into two of those features, metastore and table management service, driven by Bytedance to illustrate Hudi’s platform vision.

SHIYAN XU, FOUNDING MEMBER @ ONEHOUSE, APACHE HUDI PMC

Shiyan Xu is a founding member and an engineering manager at Onehouse. He’s also a PMC member of Apache Hudi. Prior to Onehouse, he worked as a team lead at Zendesk, where he led the data lake team to productionize large-scale Hudi-based data platform services. Shiyan received his bachelor’s degree in Electrical & Electronic Engineering from Nanyang Technological University.


8:00 PM SGT – The power of data orchestration: Storage Acceleration and Servitization at Shopee

Shopee is the leading e-commerce platform in SouthEast Asia. In this presentation, Luo Li from Shopee will share their Data Infra team’s recent project on acceleration with Presto and storage servitization. He will share the details on how Shopee leverages Alluxio to accelerate Presto query and provide standardized methods of accessing data through Alluxio-Fuse and Alluxio-S3.

LUO LI, DIRECTOR OF DATA INFRA @ SHOPEE, ALLUXIO PMC

Luo Li is the Director of Data Infra at Shopee, where he and his team work to provide business teams with Big Data fundamental facilities and systems. With previous job experiences at Baidu, Alibaba, and DiDi, he has 10+ years of experience in BigData Infrastructure and is highly experienced in Apache Open Source Big Data ecosystems. Luo Li holds a Masters Degree in Computer Science from Beijing Institute of Technology and is also an Alluxio PMC.

past events

ALLUXIO DAY XV – SEPTEMBER 15, 2022


ALLUXIO DAY XII – April 28, 2022


ALLUXIO DAY X – MARCH 3, 2022


ALLUXIO DAY IX – JANUARY 21, 2022 (presentations in chinese)


ALLUXIO DAY VIII – DECEMBER 14, 2021


ALLUXIO DAY VI – OCTOBER 12, 2021


Apache Hudi : The Path Forward

Vinoth Chandar, Apache Hudi & Raymond Xu, Zendesk

Enabling Presto Caching at Uber with Alluxio

Curt Hu, Uber & Beinan Wang, Alluxio

ALLUXIO DAY V – AUGUST 27, 2021 (presentations in chinese)


ALLUXIO DAY IV – JUNE 24, 2021


ALLUXIO DAY III – APRIL 27, 2021


ALLUXIO DAY II, DAY 2 – MARCH 11, 2021 (presentations in chinese)


ALLUXIO DAY II, DAY 1 – MARCH 9, 2021 (presentations in chinese)


ALLUXIO DAY, DAY 3 – JANUARY 24, 2021 (presentations in chinese)


Open Source Roundtable

Bin Fan, Alluxio
Edward Huang, PingCap
Yuandong Tian, Facebook
Jerry Shao, Tencent
Long Chen, Tencent Cloud
Monica Xie, Matrix Partners China

ALLUXIO DAY, DAY 2 – JANUARY 21, 2021 (presentations in chinese)


Fireside Chat

Haoyuan Li, alluxio
Jiajun Wu, Stanford University
Yangyu Tao, Tencent
Jasmine Wang, Alluxio

ALLUXIO DAY, DAY 1 – JANUARY 19, 2021 (presentations in chinese)