Setting the Stage for Alluxio Community to Soar in the Year of the Dragon: 2023 Recap and 2024 Outlook

As we step into 2024, we look back and celebrate an incredible year of 2023 for the Alluxio community. First and foremost, thank you to all of our contributors and the broader community! Together, we have achieved remarkable milestones. 💖 📈 Highlights by Numbers Let’s take a look at the Alluxio in 2023 by numbers. … Continued

GPUs Are Fast, I/O is Your Bottleneck

This article was initially posted on ITOpsTimes. Unless you’ve been living off the grid, the hype around Generative AI has been impossible to ignore. A critical component fueling this AI revolution is the underlying computing power, GPUs. The lightning-fast GPUs enable speedy model training. But a hidden bottleneck can severely limit their potential – I/O. If … Continued

A Deep Dive into Caching in Presto

This article was initially posted on InfoWorld. Understand the caching mechanisms for the popular distributed SQL engine and how to use them to improve query speed and efficiency. Presto is a popular, open source, distributed SQL engine that enables organizations to run interactive analytic queries on multiple data sources at a large scale. Caching is a typical optimization … Continued

Alluxio Kubernetes Operator Tutorial: Simplifying Deploying and Managing Alluxio Clusters

This blog provides a tutorial on using the Kubernetes operator to simplify deploying and managing Alluxio clusters on Kubernetes. Introduction The Alluxio Kubernetes operator makes deploying and managing Alluxio and the datasets on Kubernetes easier. With the operator, Alluxio clusters can be deployed and managed seamlessly like any other native Kubernetes application. The operator handles … Continued

Speed Trino Queries with These Performance-Tuning Tips

Originally published at The New Stack: https://thenewstack.io/speed-trino-queries-with-these-performance-tuning-tips/ In this article, we will discuss how data engineers and data infrastructure engineers can make Trino, a widely used query engine that’s faster and more efficient. An open source distributed SQL query engine, Trino is widely used for data analytics on distributed data storage. Optimizing Trino to make it faster … Continued

Top Tips and Tricks for PyTorch Model Training Performance Tuning [2023]

Get the latest and greatest tips to accelerate your PyTorch model training for machine learning and deep learning. PyTorch, an open-source machine learning framework, has become the de facto choice for many organizations to develop and deploy deep learning models. Model training is the most compute-intensive phase of the machine learning pipeline. It requires continuous … Continued

Trino Optimization With Distributed Caching on Data Lakes: Trino Fest 2023 Session Recap

Originally published on trino.io: https://trino.io/blog/2023/07/21/trino-fest-2023-alluxio-recap.html By 2025, there will be 100 zetabytes stored in the cloud. That’s 100,000,000,000,000,000,000,000 bytes – a huge, eye-popping number. But only about 10% of that data is actually used on a regular basis. At Uber, for example, only 1% of their disk space is used for 50% of the data they access … Continued