Originally published on trino.io: https://trino.io/blog/2023/07/21/trino-fest-2023-alluxio-recap.html By 2025, there will be 100 zetabytes stored in the cloud. That’s 100,000,000,000,000,000,000,000 bytes – a huge, eye-popping number. But only about 10% of that data is actually used on a regular basis. At Uber, for example, only 1% of their disk space is used for 50% of the data they access … Continued
Data caching is essential to the modern data stack, allowing organizations to access data quickly and efficiently for analytics and AI. On June 28, 2023, we presented Data Caching Strategies for Data Analytics and AI at Data+AI Summit 2023. We are excited to bring you a recap of that presentation through this blog post. We … Continued
This blog post discusses the synergy between Trino and Alluxio, and how to deploy Alluxio as the caching layer for Trino. You will learn Why should you choose Alluxio as a cache for Trino How do Trino and Alluxio work together How to configure Alluxio to point to S3 storage like MinIO How to query … Continued
This blog was originally published in the Presto blog: https://prestodb.io/blog/2022/01/28/avoid-data-silos-in-presto-in-meta Alluxio: Rongrong Zhong Meta: James Sun, Ke Wang Raptor is a Presto connector (presto-raptor) that is used to power some critical interactive query workloads in Meta (previously Facebook). Though referred to in the ICDE 2019 paper Presto: SQL on Everything, it remains somewhat mysterious to many Presto users … Continued
This blog was originally published in Razorpay Engineering Blog: https://engineering.razorpay.com/how-trino-and-alluxio-power-analytics-at-razorpay-803d3386daaf Razorpay is a large fintech company in India. Razorpay provides a payment solution that offers a fast, affordable, and secure way to accept and disburse payments online. On the engineering side, the availability and scalability of analytics infrastructure are crucial to providing seamless experiences to … Continued
deep dive into two important areas of active development going forward – table metadata management and caching.
At Aspect Analytics we intend to use Dask, a distributed computation library for Python, to deal with MSI data stored as large tensors. In this talk we explore using Alluxio and Alluxio FUSE as a data consolidation and caching layer for some of our bioinformatics workflows.
Many Spark users may not be aware of the differences in memory utilization between caching data directly in-memory into the Spark JVM versus storing data off-heap via an in-memory storage service like Alluxio. In this office hour, I will highlight the two approaches with a demo and open up for discussions
This talk describes a stack of open-source projects to serve high-concurrent and low-latency SQL queries using Presto with Alluxio on big data in the cloud. Deploying Alluxio as a data orchestration layer to access cloud storage object storage (e.g., AWS S3), this architecture greatly enhances the data locality of Presto with distributed and cross-query caching, thus avoids reading same data repeatedly from the cloud storage.