Spark is a widely adopted open source framework that provides a unified interface for analytics and machine learning workloads. Alluxio, originating from the UC Berkeley AMPLab – the same lab as Spark, is an open source data orchestration platform that empowers compute frameworks like Spark by providing stateful caching to enable efficient data sharing between multiple jobs and improving resilience against job failures as well as bringing data together from many different sources, be it remote HDFS or cloud object stores.
Alluxio partnered with IBM to deliver a Spark-based solution to provide fast data analytics. With the integration of IBM Spectrum Conductor, an advanced workload and resource management platform that maximizes hardware utilization to speed results and cut infrastructure costs, Alluxio and IBM delivered a solution that powers leading telecom company’s applications to support 320 million subscribers. In this online meetup, we will present the benefits of the fast analytics stack of Spark on Alluxio and IBM and dive into a leading telecom’s use case of leveraging Spark and Alluxio to process massive amounts of mobile data.
In this online meetup, you will learn about:
- Why the leading companies are moving towards a decoupled compute and storage architecture, and the associated challenges and requirements.
- Why Spark and Alluxio together can solve the challenges and fulfill the requirements
- How leading telecom leverages Spark with Alluxio for fast data processing at scale on top of object store and HDFS
Spark is a widely adopted open source framework that provides a unified interface for analytics and machine learning workloads. Alluxio, originating from the UC Berkeley AMPLab – the same lab as Spark, is an open source data orchestration platform that empowers compute frameworks like Spark by providing stateful caching to enable efficient data sharing between multiple jobs and improving resilience against job failures as well as bringing data together from many different sources, be it remote HDFS or cloud object stores.
Alluxio partnered with IBM to deliver a Spark-based solution to provide fast data analytics. With the integration of IBM Spectrum Conductor, an advanced workload and resource management platform that maximizes hardware utilization to speed results and cut infrastructure costs, Alluxio and IBM delivered a solution that powers leading telecom company’s applications to support 320 million subscribers. In this online meetup, we will present the benefits of the fast analytics stack of Spark on Alluxio and IBM and dive into a leading telecom’s use case of leveraging Spark and Alluxio to process massive amounts of mobile data.
In this online meetup, you will learn about:
- Why the leading companies are moving towards a decoupled compute and storage architecture, and the associated challenges and requirements.
- Why Spark and Alluxio together can solve the challenges and fulfill the requirements
- How leading telecom leverages Spark with Alluxio for fast data processing at scale on top of object store and HDFS
Video:
Presentation slides:
Spark is a widely adopted open source framework that provides a unified interface for analytics and machine learning workloads. Alluxio, originating from the UC Berkeley AMPLab – the same lab as Spark, is an open source data orchestration platform that empowers compute frameworks like Spark by providing stateful caching to enable efficient data sharing between multiple jobs and improving resilience against job failures as well as bringing data together from many different sources, be it remote HDFS or cloud object stores.
Alluxio partnered with IBM to deliver a Spark-based solution to provide fast data analytics. With the integration of IBM Spectrum Conductor, an advanced workload and resource management platform that maximizes hardware utilization to speed results and cut infrastructure costs, Alluxio and IBM delivered a solution that powers leading telecom company’s applications to support 320 million subscribers. In this online meetup, we will present the benefits of the fast analytics stack of Spark on Alluxio and IBM and dive into a leading telecom’s use case of leveraging Spark and Alluxio to process massive amounts of mobile data.
In this online meetup, you will learn about:
- Why the leading companies are moving towards a decoupled compute and storage architecture, and the associated challenges and requirements.
- Why Spark and Alluxio together can solve the challenges and fulfill the requirements
- How leading telecom leverages Spark with Alluxio for fast data processing at scale on top of object store and HDFS
Video:
Presentation slides:
Videos:
Presentation Slides:
Complete the form below to access the full overview:
Videos
In the rapidly evolving landscape of AI and machine learning, Platform and Data Infrastructure Teams face critical challenges in building and managing large-scale AI platforms. Performance bottlenecks, scalability of the platform, and scarcity of GPUs pose significant challenges in supporting large-scale model training and serving.
In this talk, we introduce how Alluxio helps Platform and Data Infrastructure teams deliver faster, more scalable platforms to ML Engineering teams developing and training AI models. Alluxio’s highly-distributed cache accelerates AI workloads by eliminating data loading bottlenecks and maximizing GPU utilization. Customers report up to 4x faster training performance with high-speed access to petabytes of data spread across billions of files regardless of persistent storage type or proximity to GPU clusters. Alluxio’s architecture lowers data infrastructure costs, increases GPU utilization, and enables workload portability for navigating GPU scarcity challenges.
In this talk, Zhe Zhang (NVIDIA, ex-Anyscale) introduced Ray and its applications in the LLM and multi-modal AI era. He shared his perspective on ML infrastructure, noting that it presents more unstructured challenges, and recommended using Ray and Alluxio as solutions for increasingly data-intensive multi-modal AI workloads.
As large-scale machine learning becomes increasingly GPU-centric, modern high-performance hardware like NVMe storage and RDMA networks (InfiniBand or specialized NICs) are becoming more widespread. To fully leverage these resources, it’s crucial to build a balanced architecture that avoids GPU underutilization. In this talk, we will explore various strategies to address this challenge by effectively utilizing these advanced hardware components. Specifically, we will present experimental results from building a Kubernetes-native distributed caching layer, utilizing NVMe storage and high-speed RDMA networks to optimize data access for PyTorch training.