alluxio day Archives

The Power of Data Orchestration: Storage Acceleration and Servitization at Shopee

September 27, 2022

Shopee is the leading e-commerce platform in SouthEast Asia. In this presentation, Luo Li from Shopee will share their Data Infra team’s recent project on acceleration with Presto and storage servitization. He will share the details on how Shopee leverages Alluxio to accelerate Presto query and provide standardized methods of accessing data through Alluxio-Fuse and Alluxio-S3.

Tags: alluxio day, presto, shopee

Apache Hudi: Community-Driven Development

September 27, 2022

Apache Hudi’s open-source community is very active and healthy. In this talk, an overview of community-driven major features will be presented, followed by a deep-dive into two of those features, metastore and table management service, driven by Bytedance to illustrate Hudi’s platform vision.

Tags: alluxio day, apache hudi, metastore, onehouse, open source

Modern Data Stack in Motion

September 27, 2022

In this presentation, I will talk about the birth, the growth, and the prosperity of modern data stack. I will show you why modern data stack is more than a buzzword, and how it will possibly evolve in the next couple of years.

Tags: alluxio day, data, risingwave

Alluxio Day x APAC Modern Data Stack

Qanvast@OUE, OUE Downtown Gallery 1 * September 22, 2022

Join us for these great talks featuring speakers from RisingWave Labs, Onehouse, Shopee, and Alluxio! Learn about how Alluxio helps the big data analytics stack to be cloud-native, why modern data stack is more than a buzzword, an overview of community-driven major features in Apache Hudi’s open-source community, and how Shopee leverages Alluxio to accelerate Presto query. Attendees can join both in-person in Singapore as well as online on Zoom.

Real-Time Analytics: Going Beyond Stream Processing With Apache Pinot

September 15, 2022

Streaming systems form the backbone of the modern data pipeline as the stream processing capabilities provide insights on events as they arrive. But what if we want to go further than this and execute analytical queries on this real-time data? That’s where Apache Pinot comes in.

OLAP databases used for analytical workloads traditionally executed queries on yesterday’s data with query latency in the 10s of seconds. The emergence of real-time analytics has changed all this and the expectation is that we should now be able to run thousands of queries per second on fresh data with query latencies typically seen on OLTP databases.

Apache Pinot is a realtime distributed OLAP datastore, which is used to deliver scalable real time analytics with low latency. It can ingest data from streaming sources like Kafka, as well as from batch data sources (S3, HDFS, Azure Data Lake, Google Cloud Storage), and provides a layer of indexing techniques that can be used to maximize the performance of queries.

Come to this talk to learn how you can add real-time analytics capability to your data pipeline.

Tags: alluxio day, analytics, apache pinot, data, real time, startree

ML-Based SQL Query Resource Usage Prediction

September 15, 2022

With the advent of the Big Data era, it is usually computationally expensive to calculate the resource usages of a SQL query. Can we estimate the resource usages of SQL queries more efficiently without any computation in a SQL engine kernel? In this session, Chunxu and Beinan would like to introduce how Twitter’s data platform leverages a machine learning-based approach in Presto and BigQuery to estimate query utilization with 90%+ accuracy.

Tags: alluxio day, big data, machine learning, presto, sql, twitter

The Architecture Overview of OceanBase DataBase

September 15, 2022

OceanBase Database, is an open-source, distributed Hybrid Transactional/Real-time Operational Analytics (HTAP) database management system that has set new world records in both the TPC-C and TPC-H benchmark tests. OceanBase Database starts from 2010, and it has been serving all of the critical systems in Alipay. Besides Alipay, OceanBase has also been serving customer from a variety of sectors, including Internet, financial services, telecommunications and retail industry.

Tags: alluxio day, data, oceanbase

Tag: alluxio day