On-Demand Videos

With data lakes expanding from on-prem to the cloud as well as increasing use of new object data stores, data platform teams are challenged with providing consistent, high-throughput access to distributed data sources for analytics and AI/ML applications. In today’s hybrid cloud and multi-cloud era, data-intensive applications such as Presto, Spark, Hive, and Tensorflow are suffering more sluggish response times and increased complexity with the growing separation of data and compute.

Join Alluxio’s distributed systems experts as they explore today’s data access challenges and open source data orchestration solutions for modernizing your data platform.

In this tech talk, you’ll learn:

How data access and throughput challenges are hindering large-scale analytics and AI/ML applications
How a data orchestration layer can simplify distributed data access and improve performance
Real-world production use cases and example journeys for architecting a modern data platform

Watch now

video

Alluxio for Machine Learning Workloads

ALLUXIO DAY IV 2021

June 24, 2021

Driven by strong interests from our open-source community, the core team of Alluxio started to re-design an efficient and transparent way for users to leverage data orchestration through the POSIX interface. We have introduced a new JNI-based FUSE implementation to support POSIX data access, as well as many improvements in relevant data operations like more efficient distributedLoad, optimizations on listing or calculating directories with a massive amount of files, which are common in model training.

Watch now

video

Accelerating analytics workloads with Alluxio data orchestration and Intel® Optane™ persistent memory

ALLUXIO DAY IV 2021

June 24, 2021

Today’s analytics workloads demand real-time access to expansive amounts of data. This session demonstrates how Alluxio’s data orchestration platform, running on Intel Optane persistent memory, accelerates access to this data and uncovers its valuable business insights faster.

Watch now

video

RaptorX: Building a 10X Faster Presto with hierarchical cache

ALLUXIO DAY IV 2021

June 24, 2021

RaptorX is an internal project name aiming to boost query latency significantly beyond what vanilla Presto is capable of. For this session, we introduce the hierarchical cache work including Alluxio data cache, fragment result cache, etc. Cache is the key building block for RaptorX. With the support of the cache, we are able to boost query performance by 10X. This new architecture can beat performance oriented connectors like Raptor with the added benefit of continuing to work with disaggregated storage.

Watch now

video

Improving Presto performance with Alluxio at TikTok

ALLUXIO DAY IV 2021

June 24, 2021

Nowadays it is not straightforward to integrate Alluxio with popular query engines like Presto on existing Hive data. Solutions proposed by the community like Alluxio Catalog Service or Transparent URI brings unnecessary pressure on Alluxio masters when querying files should not be cached. This talk covers TikTok’s approach on adopting Alluxio for the cache layer without introducing additional services.

Watch now

video

setting-up-monitoring-system-for-alluxio-with-prometheus-and-grafana-in-10-minutes

ALLUXIO DAY IV 2021

June 24, 2021

Alluxio has an excellent metrics system and supports various kinds of metrics, e.g. an embedded JSON sink and the prometheus sink. Users and developers can easily create a custom sink of Alluxio by implementing the Sink interface.

Also, Alluxio provides a metrics page in web UI to display some key information of Alluxio, such as bytes throughput and storage space. However, if you want a more flexible and universal monitoring, additional work is required.

Watch now

video

Building a high-performance data lake analytics engine at Alibaba Cloud with Presto+Alluxio

ALLUXIO DAY III 2021

April 27, 2021

Data Lake Analytics(DLA) is a large scale serverless data federation service on Alibaba Cloud. One of its serverless analytics engine is based on Presto. The DLA Presto engine supports a variety of data sources and is widely used in different application scenarios in the cloud. In this session, we will talk about the system architecture of DLA Presto engine, as well as the challenges and solutions. In particular, we will introduce the use of alluxio local cache to solve performance issues on OSS data sources caused by access delay and OSS bandwidth limitation. We will discuss the principle of alluxio local cache and some improvements we have made.

Watch now

video

Speed up large-scale ML/DL offline inference job with Alluxio

ALLUXIO DAY III 2021

April 27, 2021

Increasingly powerful compute accelerators and large training dataset have made the storage layer a potential bottleneck in deep learning training/inference.

Offline inference job usually consumes and produces tens of tera-bytes data while running more than 10 hours.

For a large-scale job, it usually causes high IO pressure, increase job failure rate, and bring many challenges for system stability.

We adopt alluxio which acts as an intermediate storage tier between the compute tier and cloud storage to optimize IO throughput of deep learning inference job.

For the production workload, the performance improves 18% and we seldom see job failure because of storage issue.

Watch now

Alluxio Enterprise AI

Alluxio Enterprise Data

On-Demand Videos

ALLUXIO DAY V 2021

ALLUXIO DAY V 2021

ALLUXIO DAY V 2021

ALLUXIO WEBINAR

ALLUXIO DAY IV 2021

ALLUXIO DAY IV 2021

ALLUXIO DAY IV 2021

ALLUXIO DAY IV 2021

ALLUXIO DAY IV 2021

ALLUXIO DAY III 2021

ALLUXIO DAY III 2021

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer