O’Reilly AI Conference Keynote: Data Orchestration for AI, Big Data, and Cloud

June 28, 2019

Haoyuan Li

Founder & CEO

Alluxio

Alluxio Open Source creator Haoyuan Li‘s keynote at O’Reilly Artificial Intelligence Conference discusses data revolution trend, the inevitable journey of data silos, and the missing piece of the data world – Data Orchestration System!

The data ecosystem has heavily evolved over the past two decades. There’s been an explosion of data-driven frameworks, such as Presto, Hive, and Spark to run analytics and ETL queries and TensorFlow and PyTorch to train and serve models. On the data side, the approach to managing and storing data has evolved from HDFS to cheaper, more scalable and separated services typified by cloud stores like AWS S3. As a result, data engineering has become increasingly complex, inefficient, and hard, particularly in hybrid and cloud environments.

Haoyuan Li offers an overview of a data orchestration layer that provides a unified data access and caching layer for single cloud, hybrid, and multicloud deployments. It enables distributed compute engines like Presto, TensorFlow, and PyTorch to transparently access data from various storage systems (including S3, HDFS, and Azure) while actively leveraging an in-memory cache to accelerate data access.

Video:

Presentation slides:

Data Orchestration for AI, Big Data, and Cloud from Alluxio, Inc.

You can find many production use cases’ details here. If you have any questions regarding the open source data orchestration system, welcome to join our community slack channel!

‍

Videos:

Presentation Slides:

O’Reilly AI Conference Keynote: Data Orchestration for AI, Big Data, and Cloud from Alluxio, Inc.

Video:

Presentation slides:

Data Orchestration for AI, Big Data, and Cloud from Alluxio, Inc.

You can find many production use cases’ details here. If you have any questions regarding the open source data orchestration system, welcome to join our community slack channel!

‍

Videos:

Presentation Slides:

O’Reilly AI Conference Keynote: Data Orchestration for AI, Big Data, and Cloud from Alluxio, Inc.

Complete the form below to access the full overview:

Videos

AI/ML Infra Meetup Accelerating the Data Path to the GPU for AI and Beyond

In this talk, Sandeep Joshi, , Senior Manager at NVIDIA, shares how to accelerate the data access between GPU and storage for AI. Sandeep will dive into two options: CPU- initiated GPUDirect Storage and GPU-initiated SCADA.

August 14, 2025

AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access

Bin Fan, VP of Technology at Alluxio, introduces how Alluxio, a software layer transparently sits between application and S3 (or other object stores), provides sub-ms time to first byte (TTFB) solution, with up to 45x lower latency.

August 14, 2025

AI/ML Infra Meetup | LLM Agents and Implementation Challenges

In this talk, Pritish Udgata from Adobe provides a comprehensive overview of implementation challenges and solutions for LLM agents.

Topic include:

CoT vs RAG vs Agentic AI
Anatomy of an agent
Single Agent with MCP
Multi Agents with A2A
Implementation Challenges and Solutions

August 14, 2025

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer

Request a demo

Alluxio Enterprise AI

Alluxio Enterprise Data

Videos:

Presentation Slides:

Videos:

Presentation Slides:

Complete the form below to access the full overview:

Videos

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer