Accelerating Data Loading in Large-Scale ML Training With Ray and Alluxio

In the rapidly-evolving field of artificial intelligence (AI) and machine learning (ML), the efficient handling of large datasets during training is becoming more and more pivotal. Ray has emerged as a key player, enabling large-scale dataset training through effective data streaming. By breaking down large datasets into manageable chunks and dividing training jobs into smaller … Continued

Introducing DORA: The Next-generation Alluxio Architecture

Today, we are thrilled to launch the Alluxio Enterprise AI product. One of the key innovations is the introduction of the next-generation architecture DORA – a Decentralized Object Repository Architecture. This blog talks about our development of the DORA architecture, including our motivation, design decisions, and implementation. 1. Moving from Data Analytics to the AI … Continued

Top Tips and Tricks for PyTorch Model Training Performance Tuning [2023]

Get the latest and greatest tips to accelerate your PyTorch model training for machine learning and deep learning. PyTorch, an open-source machine learning framework, has become the de facto choice for many organizations to develop and deploy deep learning models. Model training is the most compute-intensive phase of the machine learning pipeline. It requires continuous … Continued