As Trino users increasingly rely on cloud object storage for retrieving data, speed and cloud cost have become major challenges. The separation of compute and storage creates latency challenges when querying datasets; scanning data between storage and compute tiers becomes I/O bound. On the other hand, cloud API costs related to GET/LIST operations and cross-region data transfer add up quickly.
The newly introduced Trino file system cache by Alluxio aims to overcome the above challenges. In this session, Jianjian will dive into Trino data caching strategies, the latest test results, and discuss the multi-level caching architecture. This architecture makes Trino 10x faster for data lakes of any scale, from GB to EB.
What you will learn:
- Challenges relating to the speed and costs of running Trino in the cloud
- The new Trino file system cache feature overview, including the latest development status and test results
- A multi-level cache framework for maximized speed, including Trino file system cache and Alluxio distributed cache
- Real-world cases, including a large online payment firm and a top ridesharing company
- The future roadmap of Trino file system cache and Trino-Alluxio integration
As Trino users increasingly rely on cloud object storage for retrieving data, speed and cloud cost have become major challenges. The separation of compute and storage creates latency challenges when querying datasets; scanning data between storage and compute tiers becomes I/O bound. On the other hand, cloud API costs related to GET/LIST operations and cross-region data transfer add up quickly.
The newly introduced Trino file system cache by Alluxio aims to overcome the above challenges. In this session, Jianjian will dive into Trino data caching strategies, the latest test results, and discuss the multi-level caching architecture. This architecture makes Trino 10x faster for data lakes of any scale, from GB to EB.
What you will learn:
- Challenges relating to the speed and costs of running Trino in the cloud
- The new Trino file system cache feature overview, including the latest development status and test results
- A multi-level cache framework for maximized speed, including Trino file system cache and Alluxio distributed cache
- Real-world cases, including a large online payment firm and a top ridesharing company
- The future roadmap of Trino file system cache and Trino-Alluxio integration
Video:
Presentation slides:
Videos:
Presentation Slides:
Complete the form below to access the full overview:
.png)
Videos

Watch this on-demand video to learn about the latest release of Alluxio Enterprise AI. In this webinar, discover how Alluxio AI 3.7 eliminates cloud storage latency bottlenecks with breakthrough sub-millisecond performance, delivering up to 45× faster data access than S3 Standard without changing your code. Alluxio AI 3.7 is also packed with new features designed to supercharge your AI infrastructure while keeping your data secure.Key highlights include:
- Alluxio Ultra Low Latency Caching for Cloud Storage
- Role-Based Access Control (RBAC) for S3 Access
- 5X Faster Cache Preloading with Alluxio Distributed Cache Preloader
- FUSE Non-Disruptive Upgrade
- Other New Features for Alluxio Admins

Real-time OLAP databases are optimized for speed and often rely on tightly coupled storage-compute architectures using disks or SSDs. Decoupled architectures, which use cloud object storage, introduce an unavoidable tradeoff: cost efficiency at the expense of performance. This makes them unsuitable for databases that need to provide low-latency, real-time analytics, especially the new wave of LLM-powered dashboards, retrieval-augmented generation (RAG), and vector-embedding searches that thrive only when fresh data is milliseconds away. Can we achieve both cost efficiency and performance?
In this talk, we’ll explore the engineering challenges of extending Apache Pinot—a real-time OLAP system—onto cloud object storage while still maintaining sub-second P99 latencies.
We’ll dive into how we built an abstraction in Apache Pinot to make it agnostic to the location of data. We’ll explain how we can query data directly from the cloud (without needing to download the entire dataset, as with lazy-loading) while achieving sub-second latencies. We’ll cover the data fetch and optimization strategies we implemented, such as pipelining fetch and compute, prefetching, selective block fetches, index pinning, and more. We'll also share our latest work about integration with open table formats like iceberg, and how we will continue to achieve fast analytics directly on parquet files by implementing all the same techniques that apply to tiered storage.

The data lake is a fantastic, low-cost place to put data at rest for offline analytics, but we've built it under the terms of a terrible bargain: all that cheap storage at scale was a great thing, but we gave up schema management and transactions along the way. Apache Iceberg has emerged as king of the Open Table Formats to fix this very problem.
Built on the foundation of Parquet files, Iceberg adds a simple yet flexible metadata layer and integration with standard data catalogs to provide robust schema support and ACID transactions to the once ungoverned data lake. In this talk, we'll build Iceberg up from the basics, see how the read and write path work, and explore how it supports streaming data sources like Apache Kafka™. Then we'll see how Confluent's Tableflow brings Kafka together with open table formats like Iceberg and Delta Lake to make operational data in Kafka topics instantly visible to the data lake without the usual ETL—unifying the operational/analytical divide that has been with us for decades.