On-Demand Videos

video

High-performance data lake with Apache Hudi and Alluxio at T3Go [Chinese]

ALLUXIO DAY 2021

January 19, 2021

video

Building a high-performance platform on AWS to support real-time gaming services using Presto, Alluxio, and S3

Electronic Arts (EA) is a leading company in the gaming industry, providing over a thousand games to serve billions of users worldwide. The EA Data & AI Department builds hundreds of platforms to manage petabytes of data generated by games and users every day. These platforms consist of a wide range of data analytics, from real-time data ingestion to ETL pipelines. Formatted data produced by our department is widely adopted by executives, producers, product managers, game engineers, and designers for marketing and monetization, game design, customer engagement, player retention, and end-user experience.

Near real-time information for EA’s online services is critical for making business decisions, such as campaigns and troubleshooting. These services include, but are not limited to, real-time data visualization, dashboarding, and conversational analytics. Highly time-sensitive applications such as BI software, dashboards and AI tools heavily rely on these services. To support these use cases, we studied an innovative platform with Presto as the computing engine and Alluxio as a data orchestration layer between Presto and S3 storage. We evaluated this platform with real industrial examples of data visualization, dashboarding, and a conversational chatbot. Our preliminary results show that Presto with Alluxio outperforms S3 significantly in all cases, with a 6x performance gain when handling a large number of small files.

Watch now

video

Reducing large S3 API costs using Alluxio at Datasapiens

Datasapiens is an international data-analytics startup based in Prague. We help our clients to uncover the value of their data and open up new revenue streams for them. We provide an end-to-end service that manages the data pipeline and automates the process of generating data insights.

In this talk, we will describe how we have solved an issue with large S3 API costs incurred by Presto under several usage concurrency levels by implementing Alluxio as a data orchestration layer between S3 and Presto. Also, we will show the results of an experiment with estimating the per-query S3 API costs using the TPC-DS dataset.

This talk will focus on:

The Hadoop ecosystem at Datasapiens
Drastic increase of S3 API costs during performance tests with Presto
S3 API costs tests with TPC-DS
Implications to the cloud data lake architecture

Watch now

video

Building a scalable analytics environment to support diverse workloads

Watch now

video

How to teach your data scientist to leverage an analytics cluster with Presto, Spark, and Alluxio

Watch now

video

Powering interactive analytics with Alluxio and Presto

Video: Presentation Slides: Presentation Slides: Powering Interactive Analytics with Alluxio and Presto from Alluxio, Inc. ‍

Watch now

video

Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration between Presto & Alluxio

For many latency-sensitive SQL workloads, Presto is often bound by retrieving distant data. In this talk, Rohit Jain from Facebook will introduce their teams’ collaboration with Alluxio on adding a local on-SSD Alluxio cache inside Presto workers at Facebook to improve queries with unsatisfied latency.

Watch now

video

Exploring Alluxio for Daily Tasks at Robinhood

Watch now

video

Presto: Fast SQL-on-anything across data lakes, DBMS, and NoSQL Data stores

Presto, an open source distributed SQL engine, is widely recognized for its low-latency queries, high concurrency, and native ability to query multiple data sources. Proven at scale in a variety of use cases at Comcast, GrubHub, FINRA, LinkedIn, Lyft, Netflix, Slack, Zalando, in the last few years Presto experienced an unprecedented growth in popularity in both on-premises and cloud deployments over Object Stores, HDFS, NoSQL and RDBMS data stores.

Delta Lake, a storage layer originally invented by Databricks and recently open sourced, brings ACID capabilities to big datasets held in Object Storage. While initially designed for Spark, Delta Lake now supports multiple query compute engines including Presto.

In this talk we discuss how Presto enables query-time correlations between Delta Lake, Snowflake, and Elasticsearch to drive interactive BI analytics across disparate datasets.

Watch now

video

How Presto & Alluxio leverage our data-platform at Ryte

Presto & Alluxio on AWS: How we build a Up-To-Date Data-Platform at Ryte. Video: Presentation Slides: Introducing the Hub for Data Orchestration from Alluxio, Inc. ‍

Watch now

video

High Performance Data Lake with Apache Hudi and Alluxio at T3Go

This talk introduces T3Go’s solution in building an enterprise-level data lake based on Apache Hudi & Alluxio, and how to use Alluxio to accelerate the reading and writing of data on the data lake when compute and storage are segregated.

Watch now

video

Speeding Up Spark Performance using Alluxio at China Unicom

Unicom’s traditional batch architecture consists mainly of IOE, Hive, and Greenplum systems. With the development of business, a large number of computing application modules based on diverse scenarios, chimney-like, decentralized applications have emerged. To solve the problem of resource fragmentation, we have introduced a unified computing platform for computing ecology with Spark and Alluxio as the core. Alluxio plays an important role in accelerating data processing and ensuring process stability.

Watch now

Alluxio Enterprise AI

Alluxio Enterprise Data

On-Demand Videos

ALLUXIO DAY 2021

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer