Products
On-Demand Videos
video
AI/ML Infra Meetup | AI at scale Architecting Scalable, Deployable and Resilient Infrastructure

Pratik Mishra delivered insights on architecting scalable, deployable, and resilient AI infrastructure at scale. His discussion on fault tolerance, checkpoint optimization, and the democratization of AI compute through AMD's open ecosystem resonated strongly with the challenges teams face in production ML deployments.
video
AI/ML Infra Meetup | Alluxio + S3 A Tiered Architecture for Latency-Critical, Semantically-Rich Workloads

In this talk, Bin Fan, VP of Technology at Alluxio, presents on building tiered architectures that bring sub-millisecond latency to S3-based workloads. The comparison showing Alluxio's 45x performance improvement over S3 Standard and 5x over S3 Express One Zone demonstrated the critical role the performance & caching layer plays in modern AI infrastructure.
video
AI/ML Infra Meetup | Achieving Double-Digit Millisecond Offline Feature Stores with Alluxio

In this talk, Greg Lindstrom shared how Blackout Power Trading achieved double-digit millisecond offline feature store performance using Alluxio, a game-changer for real-time power trading where every millisecond counts. The 60x latency reduction for inference queries was particularly impressive.
.png)
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
video
The practice of Presto & Alluxio in E-commerce big data platform
JD.com is one of the largest e-commerce corporations. In big data platform of JD.com, there are tens of thousands of nodes and tens of petabytes off-line data which require millions of spark and MapReduce jobs to process everyday. As the main query engine, thousands of machines work as Presto nodes and Presto plays an import role in the field of In-place analysis and BI tools. Meanwhile, Alluxio is deployed to improve the performance of Presto. The practice of Presto & Alluxio in JD.com benefits a lot of engineers and analysts.
No items found.
video
Data Orchestration for Analytics and AI in the Cloud Era
Data platforms span multiple clusters, regions and clouds to meet the business needs for agility, cost effectiveness, and efficiency. Organizations building data platforms for structured and unstructured data have standardized on separation of storage and compute to remain flexible while avoiding vendor lock-in. Data orchestration has emerged as the foundation of such a data platform for multiple use cases all the way from data ingestion to transformations to analytics and AI.
In this keynote from Haoyuan Li, founder and CEO of Alluxio, we will showcase how organizations have built data platforms based on data orchestration. The need to simplify data management and acceleration across different business personas has given rise to data orchestration as a requisite piece of the modern data platform. In addition, we will outline typical journeys for realizing a hybrid and multi-cloud strategy.
Large Scale Analytics Acceleration
Model Training Acceleration
Hybrid Multi-Cloud
Data Platform Modernization
Data Migration
video
Alluxio Use Cases and Future Directions
In this keynote, Calvin Jia will share some of the hottest use cases in Alluxio 2 and discuss the future directions of the project being pioneered by Alluxio and the community. Bin Fan will provide an overview of the growth of Alluxio open-source community with highlights on community-driven collaboration with engineering teams from Microsoft and Alibaba to advance the technology.
Large Scale Analytics Acceleration
Model Training Acceleration
Hybrid Multi-Cloud
Data Migration
Data Platform Modernization
video
The Future of Computing is Distributed
Distributed applications are not new. The first distributed applications were developed over 50 years ago with the arrival of computer networks, such as ARPANET. Since then, developers have leveraged distributed systems to scale out applications and services, including large-scale simulations, web serving, and big data processing. However, until recently, distributed applications have been the exception, rather than the norm. However, this is changing quickly. There are two major trends fueling this transformation: the end of Moore’s Law and the exploding computational demands of new machine learning applications. These trends are leading to a rapidly growing gap between application demands and single-node performance which leaves us with no choice but to distribute these applications. Unfortunately, developing distributed applications is extremely hard, as it requires world-class experts. To make distributed computing easy, we have developed Ray, a framework for building and running general-purpose distributed applications.
Model Training Acceleration
Data Platform Modernization
video
Introducing the Hub for Data Orchestration
We introduce Data Orchestration Hub, a management service that makes it easy to build an analytics or machine learning platform on data sources across regions to unify data lakes. Easy to use wizards connect compute engines, such as Presto or Spark, to data sources across data centers or from a public cloud to a private data center. In this session, you will witness the use of “The Hub” to connect a compute cluster in the cloud with data sources on-premises using Alluxio. This new service allows you to build a hybrid cloud on your own, without the expertise needed to manage or configure Alluxio.
Large Scale Analytics Acceleration
Hybrid Multi-Cloud
Data Platform Modernization
video
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
In this keynote, you will learn about the evolution of the global data platform at Rakuten spread across multiple regions, and clouds. In addition, you will hear about the journey across the years, and the use of data orchestration for multiple use cases.
Large Scale Analytics Acceleration
Hybrid Multi-Cloud
Data Platform Modernization
video
Alluxio Architecture and Scaling Performance
Over the years, Alluxio has grown significantly to be the data orchestration framework for the cloud. The community developers and users have contributed a lot of effort and innovation to make Alluxio the system it is today. There are many users and companies deploying Alluxio at very large scale, and with the large scale, comes different types of challenges.
In this talk, I will introduce the high-level architecture of the current system, and present the various components of Alluxio. Also, I will discuss some of the main challenges of large scale Alluxio deployments, and the lessons we learned from those environments. This talk will detail some of the major scalability improvements added in the past several months, and how users can benefit from the changes.
Large Scale Analytics Acceleration
video
What’s new in Alluxio 2.4
ALLUXIO COMMUNITY OFFICE HOUR
We are extremely excited to announce the release of Alluxio 2.4.0!
Alluxio 2.4.0 focuses on features critical to large scale, production deployments in Cloud and Hybrid Cloud environments. Features such as highly scalable metadata journaling, aggregate cluster metrics monitoring, and automated detection of JVM pauses further improve Alluxio’s suitability for demanding workloads. Devops tools are also key for triaging issues when they occur. In Alluxio 2.4 we further improve the cluster wide log collection framework. Finally, Alluxio is continually expanding its state of the art integrations with frameworks and storage systems. Alluxio 2.4 introduces and improves integrations with Kubernetes, Azure Data Lake Storage, and Apache Ozone. Alluxio 2.4 is also the first Alluxio release that has support for Java 11.
In this Office Hour, we will go over:
- Expanded metadata service
- Cloud native deployment
- Simplified DevOps and system monitoring
- Support for Java 11
Large Scale Analytics Acceleration
Hybrid Multi-Cloud
video
Hybrid Data Lake Architecture with Presto & Spark in the cloud accessing on-prem storage
ALLUXIO COMMUNITY OFFICE HOUR
In this talk, we describe the architecture to migrate analytics workloads incrementally to any public cloud (AWS, Google Cloud Platform, or Microsoft Azure) directly on on-prem data without copying the data to cloud storage.
In this Office Hour:
- We will go over an architecture for running elastic compute clusters in the cloud using on-prem HDFS.
- Have a casual online video chat with Alluxio Open Source core maintainers to address any Alluxio related questions from our community members
Large Scale Analytics Acceleration
Hybrid Multi-Cloud
video
StorageQuery: federated querying on object stores, powered by Alluxio and Presto
Over the last few years, organizations have worked towards the separation of storage and compute for a number of benefits in the areas of cost, data duplication and data latency. Cloud resolves most of these issues but comes to the expense of needing a way to query data on remote storages. Alluxio and Presto are a powerful combination to address the compute problem, which is part of the strategy used by Simbiose Ventures to create a product called StorageQuery – A platform to query files in cloud storages with SQL.
This talk will focus on:
- How Alluxio fits StorageQuery’s tech stack;
- Advantages of using Alluxio as a cache layer and its unified filesystem
- Development of new under file system for Backblaze B2 and fine-grained code documentation;
- ShannonDB remote storage mode.
Large Scale Analytics Acceleration
Hybrid Multi-Cloud
video
What’s new in Alluxio 2.3
ALLUXIO COMMUNITY OFFICE HOUR
Alluxio 2.3 was just released at the end of June 2020. Calvin and Bin will go over the new features and integrations available and share learnings from the community. Any questions about the release and on-going community feature development are welcome.
In this Office Hour, we will go over:
- Glue Under Database integration
- Under Filesystem mount wizard
- Tiered Storage Enhancements
- Concurrent Metadata Sync
- Delegated Journal Backups
Large Scale Analytics Acceleration
Hybrid Multi-Cloud