Blog

How Coupang Leverages Distributed Cache to Accelerate Machine Learning Model Training

Coupang, a Fortune 200 technology company, manages a multi-cluster GPU architecture for their AI/ML model training. This architecture introduced significant challenges, including:

Time-consuming data preparation and data copy/movement
Difficulty utilizing GPU resources efficiently
High and growing storage costs
Excessive operational overhead maintaining storage for localized data silos

To resolve these challenges, Coupang’s AI platform team implemented a distributed caching system that automatically retrieves training data from their central data lake, improves data loading performance, unifies access paths for model developers, automates data lifecycle management, and extends easily across Kubernetes environments. The new distributed caching architecture has improved model training speed, reduced storage costs, increased GPU utilization across clusters, lowered operational overhead, enabled training workload portability, and delivered 40% better I/O performance compared to parallel file systems.

Uptycs Chooses Alluxio to Power GenAI Natural Language Analytics at Terabyte Scale

Suresh Kumar Veerapathiran and Anudeep Kumar, engineering leaders at Uptycs, recently shared their experience of evolving their data platform and analytics architecture to power analytics through a generative AI interface. In their post on Medium titled Cache Me If You Can: Building a Lightning-Fast Analytics Cache at Terabyte Scale, Veerapathiran and Kumar provide detailed insights into the challenges they faced (and how they solved them) scaling their analytics solution that collects and reports on terabytes of telemetry data per day as part of Uptycs Cloud-Native Application Protection Platform (CNAPP) solutions.

AI/ML Infra Meetup at Uber Seattle: Tackling Scalability Challenges of AI Platforms

Insights from from Uber, Snap, and Alluxio on LLM training, fine-tuning, deployment, designing scalable architectures, GPU optimization, and building recommendations systems.

Accelerating Cloud Pipelines with Alluxio and Fast Durable Writes

Using Alluxio, data can be shared between pipeline stages at memory speed. By reading and writing data in Alluxio, the data can stay in memory for the next stage of the pipeline, and this can greatly increase the performance. Alluxio Enterprise Edition (AEE) introduces Fast Durable Writes, a feature which enables low latency and fault-tolerant writes. In this article, we describe the Fast Durable Writes feature, and explore how Alluxio can be deployed and used with a data pipeline.

Announcing the Release of Alluxio Enterprise Edition and Community Edition v1.7.0

We are excited to announce the release of Alluxio Enterprise Edition (AEE) and Community Edition (ACE) v1.7.0. This release brings enhanced caching policies, further ecosystem integrations, and significant usability improvements. One highlight is the Alluxio FUSE API which provides users with the ability to interact with Alluxio through a local filesystem mount. Alluxio FUSE is particularly useful for integrating with deep learning frameworks such as Tensorflow.

Flexible and Fast Storage for Deep Learning with Alluxio

In the age of growing datasets and increased computing power, deep learning has become a popular technique for AI. Deep learning models continue to improve their performance across a variety of domains, with access to more and more data, and the processing power to train larger neural networks. This rise of deep learning advances the state-of-the-art for AI, but also exposes some challenges for the access to data and storage systems. In this article, we further describe the storage challenges for deep learning workloads and how Alluxio can help to solve them.

Kyligence leverages Alluxio to accelerate OLAP in the cloud

OLAP (on-line analytical processing) technology has been widely adopted by enterprises since last century; Enterprises rely on OLAP to analyze their huge amount of data, generate reporting and so to help business people making decisions. Today in the era of big data, OLAP becomes more important and challenging than ever before; and cloud computing makes this further true. This article introduces how Kyligence, a cutting-edge big data intelligence company, leverages Alluxio to boost their performance in the cloud.

Announcing the Release of Alluxio AEE v1.6.0 and ACE v1.6.0

We are excited to announce Alluxio Enterprise Edition (AEE) 1.6.0 and Alluxio Community Edition (ACE) 1.6.0 releases. The AEE release brings a new embedded journal as well as enhancements in the areas of Security and Fast Durable Write. In addition, both the AEE and the ACE releases bring new clients support (Amazon S3 API and Python Client), major usability improvements as well as enhanced integrations with the ecosystem.

Open Source Alluxio 1.5.0 Release Highlights

Open source Alluxio 1.5.0 has been released with a large number of new features and improvements. Alluxio allows any application to access data from any storage system transparently and at memory speed. Interoperability with other technologies in the ecosystem is an important step for enabling this, and in the 1.5.0 release, we have improved the accessibility of Alluxio in several key ways.

Announcing the Release of Alluxio AEE v1.5.0 and ACE v1.5.0

We are excited to announce Alluxio Enterprise Edition (AEE) 1.5.0 and Alluxio Community Edition (ACE) 1.5.0 releases. The AEE release brings enhancements in the areas of security, multi-tenancy as well as working with multiple under-stores. In addition, both the AEE and the ACE releases bring major usability and performance improvements as well as enhanced integrations with the ecosystem.

Alluxio and Mesosphere partner to enable fast ondemand analytics with Alluxio and DCOS

Today, we’re excited to announce our partnership with Mesosphere to enable fast on-demand analytics with Alluxio via Mesosphere’s DC/OS in one-click. This partnership is a natural extension of the synergy between Alluxio and DC/OS. Alluxio, the world's first system that unifies data at memory speed, allows enterprises to manage and analyze data stored across disparate storage systems on premise and in the cloud at memory speed. Mesosphere brings enterprises the power of cloud native technologies, with the control to run on any infrastructure - datacenter or cloud.

Whats new in Alluxio 1.4.0

Alluxio 1.4.0 has been released with a large number of new features and improvements. This blog highlights some stand out aspects of the Alluxio 1.4.0 open source release: Improved Alluxio Under Storage API, Native File System REST Interface, Packet Streaming

Arimo Leverages Alluxios In-Memory Capability Improving Time-to-Results for Deep Learning Models

Deep learning algorithms have traditionally been used in specific applications, most notably, computer vision, machine translation, text mining, and fraud detection. Deep learning truly shines when the model is big and trained on large-scale datasets. Meanwhile, distributed computing platforms like Spark are designed to handle big data and have been used extensively. Therefore, by having deep learning available on Spark, the application of deep learning is much broader, and now businesses can fully take advantage of deep learning capabilities using their existing Spark infrastructure.

Alluxio Launches Industrys First System to Unify Data at Memory Speed

Our mission at Alluxio is to unify data at memory speed. Today we’re excited to unveil our first products which enable organizations to turn data into value with unprecedented ease, flexibility, and speed. We believe our new products will substantially advance Alluxio for both the community and our enterprise customers. In this blog, I will share with you the big data challenges application developers and business line owners face today, and show how Alluxio addresses these challenges.

Accelerating Data Analytics on Ceph Object Storage with Alluxio

This is an excerpt from the Accelerating Data Analytics on Ceph Object Storage with Alluxio whitepaper. As the volume of data collected by enterprises has grown, there is a continual need to find efficient storage solutions. Owing to its simplicity, scalability and cost-efficiency object storage, including Ceph, has increasingly become a popular alternative to traditional file systems. In most cases the object storage system, on-premise or in the cloud, is decoupled from compute nodes where analytics is run. There are several benefits of this separation.

Your selections don't match any items.

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer

Request a demo

Alluxio Enterprise AI

Alluxio Enterprise Data

Blog

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer