Recap Spark+AI Summit 2019

May 2, 2019

Alluxio is a proud sponsor and exhibitor of Spark+AI Summit in San Francisco. If you missed the conference, don’t worry we’ve got you covered!

Come chat with our founder and CTO, HY, at the next event!

What’s Spark+AI Summit? It’s the world’s largest conference that is focused on Apache Spark - Alluxio’s older cousin open source project from the same lab (UC Berkeley’s AMPLab - now RISElab).

Overview of the Conference by the Numbers

Spark+AI Summit originally Spark Summit started off in 2013 with around 200 attendees. This is their 6th year, and from our observation there were over 3000 attendees!
Of the 3000+ attendees, we had over 1500+ interactions and more than 500 in-depth conversations with folks already using or interested in learning about Alluxio
100 lucky attendees won our drones!

What We Learned

Adopting a cloud strategy is a top priority for most organizations at the event
Many organizations are experiencing challenges with hybrid cloud, because they are not able to access data in the public cloud and their own data warehouse efficiently
Machine learning is on the rise, but SQL queries over big data is still the bread and butter of most organizations
Kubernetes is changing the landscape of big data analytics. In the next 3-6 months, we will see a wave of organizations move to deploying big data workloads with container orchestration systems
Attendees love to win drones ;) Find us at the next event: Strata Data Conference in New York

Reasons to try the Apache Spark, Alluxio, and S3 Stack

This stack is cloud-native
Apache Spark and Alluxio are open source
S3 is cost-effective and scalable driving down devops costs with high performance

Learn more: 10X Acceleration of Spark with Alluxio Case Study , Get started with Spark and Alluxio in 5min , Download Alluxio

All of the sessions are recorded and will be viewable here.

Thanks to everyone for stopping by the Alluxio booth and the great conversations!

Additional resources:

Community office hour (virtual): Running Apache Spark with Alluxio on Amazon EMR
Got questions? Chat with Alluxio experts on Slack

Share this post

Blog

Inferless Slashes AI Model Loading Time by 12x in LLM Serving Infrastructure Using Alluxio

Inferless solved critical I/O bottlenecks in LLM inference infrastructure by implementing Alluxio, achieving 10x faster model loading (from ~200 Mbps to 2+ Gbps), reducing cold start times from minutes to seconds, and significantly improving customer experience.

New Features in Alluxio Enterprise AI 3.6

Learn about the latest features in Alluxio AI 3.6, including Accelerated AI Cold Starts for inference servers, pushdown parquet query acceleration, and more!

How Coupang Leverages Distributed Cache to Accelerate Machine Learning Model Training

Coupang, a Fortune 200 technology company, manages a multi-cluster GPU architecture for their AI/ML model training. This architecture introduced significant challenges, including:

Time-consuming data preparation and data copy/movement
Difficulty utilizing GPU resources efficiently
High and growing storage costs
Excessive operational overhead maintaining storage for localized data silos

To resolve these challenges, Coupang’s AI platform team implemented a distributed caching system that automatically retrieves training data from their central data lake, improves data loading performance, unifies access paths for model developers, automates data lifecycle management, and extends easily across Kubernetes environments. The new distributed caching architecture has improved model training speed, reduced storage costs, increased GPU utilization across clusters, lowered operational overhead, enabled training workload portability, and delivered 40% better I/O performance compared to parallel file systems.

Sign-up for a Live Demo or Book a Meeting with a Solutions Engineer

Request a demo