Alluxio’s capabilities as a Data Orchestration framework have encouraged users to onboard more of their data-driven applications to an Alluxio powered data access layer. Driven by strong interests from our open-source community, the core team of Alluxio started to re-design an efficient and transparent way for users to leverage data orchestration through the POSIX interface. This effort has a lot of progress with the collaboration with engineers from Microsoft, Alibaba and Tencent. Particularly, we have introduced a new JNI-based FUSE implementation to support POSIX data access, created a more efficient way to integrate Alluxio with FUSE service, as well as many improvements in relevant data operations like more efficient distributedLoad, optimizations on listing or calculating directories with a massive amount of files, which are common in model training. We will also share our engineering lessons and roadmap in future releases to support Machine Learning applications.
Alluxio’s capabilities as a Data Orchestration framework have encouraged users to onboard more of their data-driven applications to an Alluxio powered data access layer. Driven by strong interests from our open-source community, the core team of Alluxio started to re-design an efficient and transparent way for users to leverage data orchestration through the POSIX interface. This effort has a lot of progress with the collaboration with engineers from Microsoft, Alibaba and Tencent. Particularly, we have introduced a new JNI-based FUSE implementation to support POSIX data access, created a more efficient way to integrate Alluxio with FUSE service, as well as many improvements in relevant data operations like more efficient distributedLoad, optimizations on listing or calculating directories with a massive amount of files, which are common in model training. We will also share our engineering lessons and roadmap in future releases to support Machine Learning applications.
APACHECON 2021
Alluxio’s capabilities as a Data Orchestration framework have encouraged users to onboard more of their data-driven applications to an Alluxio powered data access layer. Driven by strong interests from our open-source community, the core team of Alluxio started to re-design an efficient and transparent way for users to leverage data orchestration through the POSIX interface. This effort has a lot of progress with the collaboration with engineers from Microsoft, Alibaba and Tencent. Particularly, we have introduced a new JNI-based FUSE implementation to support POSIX data access, created a more efficient way to integrate Alluxio with FUSE service, as well as many improvements in relevant data operations like more efficient distributedLoad, optimizations on listing or calculating directories with a massive amount of files, which are common in model training. We will also share our engineering lessons and roadmap in future releases to support Machine Learning applications.
Video:
Slides:
Videos:
Presentation Slides:
Complete the form below to access the full overview:
.png)
Videos
In this talk, Pritish Udgata from Adobe provides a comprehensive overview of implementation challenges and solutions for LLM agents.
Topic include:
- CoT vs RAG vs Agentic AI
- Anatomy of an agent
- Single Agent with MCP
- Multi Agents with A2A
- Implementation Challenges and Solutions

Watch this on-demand video to learn about the latest release of Alluxio Enterprise AI. In this webinar, discover how Alluxio AI 3.7 eliminates cloud storage latency bottlenecks with breakthrough sub-millisecond performance, delivering up to 45× faster data access than S3 Standard without changing your code. Alluxio AI 3.7 is also packed with new features designed to supercharge your AI infrastructure while keeping your data secure.Key highlights include:
- Alluxio Ultra Low Latency Caching for Cloud Storage
- Role-Based Access Control (RBAC) for S3 Access
- 5X Faster Cache Preloading with Alluxio Distributed Cache Preloader
- FUSE Non-Disruptive Upgrade
- Other New Features for Alluxio Admins

Real-time OLAP databases are optimized for speed and often rely on tightly coupled storage-compute architectures using disks or SSDs. Decoupled architectures, which use cloud object storage, introduce an unavoidable tradeoff: cost efficiency at the expense of performance. This makes them unsuitable for databases that need to provide low-latency, real-time analytics, especially the new wave of LLM-powered dashboards, retrieval-augmented generation (RAG), and vector-embedding searches that thrive only when fresh data is milliseconds away. Can we achieve both cost efficiency and performance?
In this talk, we’ll explore the engineering challenges of extending Apache Pinot—a real-time OLAP system—onto cloud object storage while still maintaining sub-second P99 latencies.
We’ll dive into how we built an abstraction in Apache Pinot to make it agnostic to the location of data. We’ll explain how we can query data directly from the cloud (without needing to download the entire dataset, as with lazy-loading) while achieving sub-second latencies. We’ll cover the data fetch and optimization strategies we implemented, such as pipelining fetch and compute, prefetching, selective block fetches, index pinning, and more. We'll also share our latest work about integration with open table formats like iceberg, and how we will continue to achieve fast analytics directly on parquet files by implementing all the same techniques that apply to tiered storage.