caching Archives

A Journey Towards Data Locality on Cloud for Machine Learning and AI

December 18, 2023 By Lu Qiu and Shawn Sun

In this blog, we discuss the importance of data locality for efficient machine learning on the cloud. We examine the pros and cons of existing solutions and the tradeoff between reducing costs and maximizing performance through data locality. We then highlight the new-generation Alluxio design and implementation, detailing how it brings value to model training … Continued

Why Adding NAS/NFS on Object Storage May not Solve Your Data Access Problem of AI

November 28, 2023 By Tarik Bennett, Beinan Wang and Hope Wang

In this blog, we discuss the data access challenges in AI and why commonly used NAS/NFS may not be a good option for your organization. 1. Early Architecture of AI/ML According to Gartner, although LLMs are on the hype, most organizations are in the early stages, with some in production. In the early stages of … Continued

A Deep Dive into Caching in Presto

October 11, 2023 By Hope Wang and Beinan Wang

This article was initially posted on InfoWorld. Understand the caching mechanisms for the popular distributed SQL engine and how to use them to improve query speed and efficiency. Presto is a popular, open source, distributed SQL engine that enables organizations to run interactive analytic queries on multiple data sources at a large scale. Caching is a typical optimization … Continued

Trino Optimization With Distributed Caching on Data Lakes: Trino Fest 2023 Session Recap

July 21, 2023 By Hope Wang, Beinan Wang and Cole Bowden (Trino)

Originally published on trino.io: https://trino.io/blog/2023/07/21/trino-fest-2023-alluxio-recap.html By 2025, there will be 100 zetabytes stored in the cloud. That’s 100,000,000,000,000,000,000,000 bytes – a huge, eye-popping number. But only about 10% of that data is actually used on a regular basis. At Uber, for example, only 1% of their disk space is used for 50% of the data they access … Continued

Data Caching Strategies for Data Analytics and AI: Data+AI Summit 2023 Session Recap

July 13, 2023 By Chunxu Tang, Beinan Wang and Hope Wang

Data caching is essential to the modern data stack, allowing organizations to access data quickly and efficiently for analytics and AI. On June 28, 2023, we presented Data Caching Strategies for Data Analytics and AI at Data+AI Summit 2023. We are excited to bring you a recap of that presentation through this blog post. We … Continued

Get Started with Trino and Alluxio in 5 Minutes

January 27, 2023 By Brian Оlsen (Trino Developer Advocate), Beinan Wang and Hope Wang

This blog post discusses the synergy between Trino and Alluxio, and how to deploy Alluxio as the caching layer for Trino. You will learn Why should you choose Alluxio as a cache for Trino How do Trino and Alluxio work together How to configure Alluxio to point to S3 storage like MinIO How to query … Continued

Avoid Data Silos in Presto in Meta: the journey from Raptor to RaptorX

August 29, 2022 By Rongrong Zhong

This blog was originally published in the Presto blog: https://prestodb.io/blog/2022/01/28/avoid-data-silos-in-presto-in-meta Alluxio: Rongrong Zhong Meta: James Sun, Ke Wang Raptor is a Presto connector (presto-raptor) that is used to power some critical interactive query workloads in Meta (previously Facebook). Though referred to in the ICDE 2019 paper Presto: SQL on Everything, it remains somewhat mysterious to many Presto users … Continued

How Trino and Alluxio Power Analytics at Razorpay

August 23, 2022 By Tanmay Krishna (Razorpay) and Utkarsh Saxena (Razorpay)

This blog was originally published in Razorpay Engineering Blog: https://engineering.razorpay.com/how-trino-and-alluxio-power-analytics-at-razorpay-803d3386daaf Razorpay is a large fintech company in India. Razorpay provides a payment solution that offers a fast, affordable, and secure way to accept and disburse payments online. On the engineering side, the availability and scalability of analytics infrastructure are crucial to providing seamless experiences to … Continued

Apache Hudi : The Path Forward

October 12, 2021

deep dive into two important areas of active development going forward – table metadata management and caching.

Tags: alluxio day, apache hudi, caching, data lake, metadata management

Tag: caching