How to Develop and Operate Cloud Native Data Platforms and Applications

November 12, 2019

Du Li

Architect of Data Infrastructure

Electronic Arts

Today, one can easily launch or terminate services with hundreds or thousands of compute instances in just a few seconds on cloud services such as AWS. However, operating, monitoring and maintaining those resources could also easily become a nightmare if the corresponding systems were not designed in a cloud-native way.

In this talk, we share our lessons in building and rebuilding our monitoring systems and data platforms at Electronic Arts (EA). In the first generation of the monitoring system, configurations were manually created for many individual software components and spread over all the resources. As services were started and terminated rapidly over time, it was extremely difficult to keep all configurations up to date. Consequently, on average we received over 1,000 alerts from thousands of machines on a daily basis, which stressed the operations team. We redesigned the system in late 2018 in a project called Monitoring As Code (MAC) emphasizing on version control and automation. MAC manages all the configurations using a GIT project in the same way as software code. Moreover, it establishes standards so that the configurations are automatically generated and deployed to keep everything in sync. As a result, it reduced the daily average number of alerts by two orders of magnitude.

In the first generation of the data platform, we used HDFS as a cache layer between ETL jobs and the underlying AWS storage service S3. However, HDFS is not a special-purpose cache service, so custom code is needed to make it work like a cache. We have to run a backup workflow in every ETL job to backup data to S3 and sync the metadata store of the ETL jobs running on HDFS and that of interactive analytic queries running directly on S3. Moreover, we rely on complex and fragile mechanisms for purging datasets when the clusters are under heavy load. The use of HDFS also makes it a challenge to rapidly scale up the YARN cluster during peak hours and scale it down during off-hours. We are currently redesigning the data platform, mainly by replacing HDFS with a special-purpose data orchestration service called Alluxio. In our initial evaluation, Alluxio not only provides better performance than HDFS but also significantly simplifies the architecture of our data platform and makes it easy to scale up and down and paves the way to a cloud native ETL processing stack.

‍

How to Develop and Operate Cloud Native Data Platforms and Applications from Alluxio, Inc.

‍

Videos:

Presentation Slides:

How to Develop and Operate Cloud Native Data Platforms and Applications from Alluxio, Inc.

‍

How to Develop and Operate Cloud Native Data Platforms and Applications from Alluxio, Inc.

‍

Videos:

Presentation Slides:

How to Develop and Operate Cloud Native Data Platforms and Applications from Alluxio, Inc.

Complete the form below to access the full overview:

Videos

AI/ML Infra Meetup Accelerating the Data Path to the GPU for AI and Beyond

In this talk, Sandeep Joshi, , Senior Manager at NVIDIA, shares how to accelerate the data access between GPU and storage for AI. Sandeep will dive into two options: CPU- initiated GPUDirect Storage and GPU-initiated SCADA.

August 14, 2025

AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access

Bin Fan, VP of Technology at Alluxio, introduces how Alluxio, a software layer transparently sits between application and S3 (or other object stores), provides sub-ms time to first byte (TTFB) solution, with up to 45x lower latency.

August 14, 2025

AI/ML Infra Meetup | LLM Agents and Implementation Challenges

In this talk, Pritish Udgata from Adobe provides a comprehensive overview of implementation challenges and solutions for LLM agents.

Topic include:

CoT vs RAG vs Agentic AI
Anatomy of an agent
Single Agent with MCP
Multi Agents with A2A
Implementation Challenges and Solutions

August 14, 2025

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer

Request a demo

Alluxio Enterprise AI

Alluxio Enterprise Data

Videos:

Presentation Slides:

Videos:

Presentation Slides:

Complete the form below to access the full overview:

Videos

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer